Administrative Records for Survey Methodology. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Administrative Records for Survey Methodology - Группа авторов страница 26

Administrative Records for Survey Methodology - Группа авторов

Скачать книгу

(RTRA), generally restrict the command set from the allowed statistical programming languages (SAS, Stata, and SPSS) and limit what the users can do to certain statistical procedures and languages for which known automated disclosure limitation procedures have been implemented.

      Most of these systems only provide access to household and person surveys. Of the known systems surveyed above, only Australia’s RADL systems and the Bank of Italy’s implementation of LISSY (Bruno, D’Aurizio, and Tartaglia-Polcini 2009, 2014) seem to provide access to business microdata through automated remote processing facilities.

      2.4.3 Licensing

      In the United States, some surveys (NCES, NLSY, and HRS) use licensing to distribute portions of the data they collect on their respondents. Commercial data providers (COMPUSTAT, etc.) also license the data distributed to researchers. Penalties for license infractions range from restricting future research grant funding, for example in HRS, to monetary penalties, for example in commercial data licenses. We are not aware of any studies that quantify the violation rates or financial penalties actually incurred due to license violations. Licensing may be limited by the enforceability of laws or contracts, and thus may be limited to residents of the same jurisdiction in which the data provider is housed. Often, some licensing is combined with the creation of ad-hoc data enclaves, the simplest of these being stand-alone, nonnetworked computer workstations.

      2.4.4 Disclosure Avoidance Methods

      Data enclaves exist to allow researchers to perform analyses within the restricted environment, and then extract or publish some form of statistical summary that can be released from the secure environment. Generally, these summaries are estimates from a statistical model. In general, model-based output is evaluated in accordance with the same criteria traditionally used for tabular output (minimum number of units within a reporting cell, minimum percentage of global activity within a reporting cell). In contrast to licensing arrangements, which allow researchers to self-monitor, statistical data enclaves have regimented output monitoring, typically by staff of the data provider. Generally, released statistical outputs are registered in some fashion, but documentation of the full provenance chain may be limited.

      All three of the examples of linked data provided in this paper rely on some version of secure data enclaves to provide microdata access to approved researchers. HRS data are made available to tenure-track researchers who sign a data use agreement and provide documentation of a secure local computing environment. An additional option for HRS data is to visit to the Michigan Center on the Demography of Aging data enclave, which makes data accessible to researchers in a physical data enclave at “headquarters,” like many NSOs. More recently, HRS has started to offer secure VDI access to researchers. The confidential data underlying the SSB, and against which validation requests are run, are also available either within the FSRDC network, or by sending validation requests by email to staff at Census headquarters (a form of “remote processing”). LEHD microdata are only available through the FSRDC.

      An open question is whether the disclosure risks addressed through physical security measures are greater for linked data. Enabling researchers to measure some of the heuristic disclosure risk such as n cell count or p-percent rule (O’Keefe et al. 2013) becomes more important when any possible combination of k variables (k large) leads to small cells or dominated cells. Even subject matter experts cannot assess these situations a priori.

      2.4.5 Data Silos

      Such administrative barriers may also be driven by ethical or confidentiality concerns. The question of consent by survey or census respondents may explicitly prevent the linkage of their survey responses or of their biological specimen with other data. For example, the Canadian Census long form of 2006 offered respondents the option to either answer survey questions on earnings, or consent to linking in their tax data on earnings. In the 2016 census, the question was no longer asked, and users were simply notified that linkage would happen.

      In the case of the LEHD data, as of December 2015, all 50 states as well as the District of Columbia had signed agreements with the Census Bureau to share data and produce public-use statistics. It would thus seem possible for researchers to access a comprehensive LEHD jobs database through the FSRDC network, by linking together the job databases from 51 administrative entities. However, all but 12 of the States had declined to automatically extend the right to use the data to external researchers within the FSRDC network. Nevertheless, some of the same states that declined to provide such permission in the FSRDC give access to researchers through their state data centers or other means. The UI state-level data is thus siloed, and researchers may be faced with nonrepresentative data on the American job market. Several European projects, such as Data without Boundaries (DwB), have investigated cross-national access with elevated expectations but relatively limited success (Schiller and Welpton 2014; Bender and Heining 2011). Increasingly, the U.S. Census Bureau and CASD also host data from other data providers, through collaborative agreements, moving

Скачать книгу