Administrative Records for Survey Methodology. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Administrative Records for Survey Methodology - Группа авторов страница 23

Administrative Records for Survey Methodology - Группа авторов

Скачать книгу

should completely overlap (Reiter, Oganian, and Karr 2009), presumably with the synthetic confidence interval being slightly larger because of the increased variation arising from the synthesis. When these results are obtained, inferences drawn about the coefficients will be consistent whether one uses synthetic or completed data. The reader interested in detailed examples that show how analytic validity is assessed in the SSB should consult Figures 2.1 and 2.2 and associated discussion in Abowd, Schmutte, and Vilhuber (2018).

      Box 2.1 Sidebox: Practical Synthetic Data Use

      The SIPP–SSA–IRS Synthetic Beta File is accessible to users in its current form since 2010. Interested users can request an account by following links at https://www.vrdc.cornell.edu/sds/. Applications are judged solely on feasibility (i.e. the necessary variables are on the SSB). After projects are approved by the Census Bureau, researchers will be given accounts on the Synthetic Data Server. Users can submit validation requests, following certain rules, outlined on the Census Bureau’s website. Deviations from the guidelines may be possible with prior approval of the Census Bureau, but are typically only granted if specialized software is needed (other than SAS or Stata), and only if said software also exists already on Census Bureau computing systems. Between 2010 and 2016, over one hundred users requested access to the server, using a succession of continuously improved datasets.

Graph depicts the distribution of delta B in Maryland.

      2.3.3 LEHD: Linked Establishment and Employee Records

      2.3.3.1 Data Description

      The LEHD data links employee wage records extracted from Unemployment Insurance (UI) administrative files from 51 states with establishment-level records from the Quarterly Census of Employment and Wages (QCEW, also provided by the partner states), the SSA-sourced record of applications for SSNs (“Numident”), residential addresses derived from IRS-provided individual tax filings, and data from surveys and censuses conducted by the U.S. Census Bureau (2000 and 2010 decennial censuses, as well as microdata from the ACS). Additional information is linked in from the Census Bureau’s Employer Business Register and its derivative files. The merged data are subject both to United States Code (U.S.C.) Title 13 and Title 26 protections. For more details, see Abowd, Haltiwanger, and Lane (2004) and Abowd et al. (2009).

      2.3.3.2 Disclosure Avoidance Methods

      We describe in detail the disclosure avoidance method used for workplace tabulations in QWI and LODES (Abowd et al. 2012). Not discussed here are the additional disclosure avoidance methods applied in advance of publishing data on job flows (Abowd and McKinney 2016). Focusing on QWI and LODES is sufficient to highlight the types of confidentiality concerns that arise from working with these linked data, and the kinds of strategies the Census Bureau uses to address them.

      In the QWI confidentiality protection scheme, confidential micro-data are considered protected by noise infusion if one of the following conditions holds: (1) any inference regarding the magnitude of a particular respondent’s data must differ from the confidential quantity by at least c% even if that inference is made by a coalition of respondents with exact knowledge of their own answers (FCSM 2005, p. 72), or (2) any inference regarding the magnitude of an item is incorrect with probability not less than y%, where c and y are confidential but generally “large.” Condition (1) is intended to prevent, say, a group of firms from “backing out” the total payroll of a specific competitor by combining their private information with the published total. Condition (2) prevents inference of counts of the number of workers or firms that satisfy some condition (say, the number of teenage workers employed in the fast food industry in Hull, GA) assuming item suppression or some additional protection, like synthetic data, when the count is too small.

      All published data from QWI use the same noise-distorted data, and any special tabulations released from the QWI must follow the same procedures. The QWI system extends the idea of multiplicative noise infusion as a cross-sectional confidentiality protection mechanism first proposed by Evans, Zayatz, and Slanta (1998). A similar noise-infusion process has been used since 2007 to protect the confidentiality of data underlying the Census Bureau’s CBP (Massell and Funk 2007) and was tested for application to the Commodity Flow Survey (Massell, Zayatz, and Funk 2006).

Скачать книгу