Administrative Records for Survey Methodology. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Administrative Records for Survey Methodology - Группа авторов страница 28
IRS – Internal Revenue Service handles tax collection for the US government (https://irs.gov)
NCHS – National Center for Health Statistics, the US NSO charged with collecting and disseminating information on health and well-being (https://www.cdc.gov/nchs/)
NSO – National statistical offices. Most countries have a single national statistical agency, but some countries (USA, Germany) have multiple statistical agencies
OASDI – Old Age, Survivors and Disability Insurance program, the official name for Social Security in the United States
QCEW – Quarterly Census of Employment and Wages is a program run by the BLS, collecting firm-level reports of employment and wages, and publishing quarterly estimates for about 95% of US jobs (https://www.bls.gov/cew/)
SER – Summary Earnings Records on SSA data
SSA – Social Security Administration, administers government-provided retirement, disability, and survivors benefits in the United States (https://ssa.gov)
SSN – Social Security Number, an identification number in the United States, originally used for management of benefits administered by the SSA, but since expanded and serving as a quasi-national identifier number
UI – Unemployment Insurance, which in the United States are administered by each of the states (and District of Columbia)
U.S.C – United States Code is the official compilation of laws and regulations in the United States
2.A.2 Concepts
Analytical validity: It exists when, at a minimum, estimands can be estimated without bias and their confidence intervals (or the nominal level of significance for hypothesis tests) can be stated accurately (Rubin 1987). The estimands can be summaries of the univariate distributions of the variables, bivariate measures of association, or multivariate relationships among all variables.
Coarsening: A method for protecting data that involves mapping confidential values into broader categories, e.g. a histogram.
Confidentiality: A “quality or condition accorded to information as an obligation not to transmit […] to unauthorized parties” (Fienberg 2005, as quoted in Duncan, Elliot, and Salazar-González 2011). Confidentiality addresses data already collected, whereas privacy (see below) addresses the right of an individual to consent to the collection of data.
Data swapping: Sensitive data records (usually households) are identified based on a priori criteria, and matched to “nearby records.” The values of some or all of the other variables are swapped, usually the geographic identifiers, thus effectively relocating the records in each other’s location.
Differential privacy: A class of formal privacy mechanisms. For instance, ε-differential privacy places an upper bound, parameterized by ε, on the ability of a user to infer from the published output whether any specific data item, or response, was in the original, confidential data (Dwork and Roth 2014).
Dirichlet-multinomial distribution: A family of discrete multivariate probability distributions on a finite support of nonnegative integers. The probability vector p of the better-known multinomial distribution is obtained by drawing from a Dirichlet distribution with parameter α.
Input noise infusion: Distorting the value of some or all of the inputs before any publication data are built or released.
Posterior predictive distribution (PPD): In Bayesian statistics, the distribution of all possible values conditional on the observed values.
Privacy: “An individual’s freedom from excessive intrusion in the quest for information and […] ability to choose [… what …] will be shared or withheld from others” (Duncan, Jabine, and de Wolf 1993, quoted in Duncan, Elliot, and Salazar-González 2011). See also confidentiality, above.
Sampling: As part of SDL, works by only publishing a fractional part of the data.
Statistical confidentiality or SDL – Statistical disclosure limitation: Can be viewed as “a body of principles, concepts, and procedures that permit confidentiality to be afforded to data, while still permitting its use for statistical purposes” (Duncan, Elliot, and Salazar-González 2011, p. 2).
Suppression: Describes the removal of cells from a published table if its publication would pose a high risk of disclosure.
Acknowledgments
John M. Abowd is the Associate Director for Research and Methodology and Chief Scientist, U.S. Census Bureau, the Edmund Ezra Day Professor of Economics, Professor of Statistics and Information Science, and the Director of the Labor Dynamics Institute (LDI) at Cornell University, Ithaca, NY, USA. https://johnabowd.com. Ian M. Schmutte is Associate Professor of Economics at the University of Georgia, Athens, GA, USA. http://ianschmutte.org. Lars Vilhuber is Senior Research Associate in the Department of Economics and Executive Director of Labor Dynamics Institute (LDI) at Cornell University, Ithaca, NY, USA. https://lars.vilhuber.com. The authors acknowledge the support of a grant from the Alfred P. Sloan Foundation (G-2015-13903), NSF Grants SES-1131848, BCS-0941226, TC-1012593. Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau, the National Science Foundation, or the Sloan Foundation. All results presented in this work stem from previously released work, were used by permission, and were previously reviewed to ensure that no confidential information is disclosed.
References
1 Abowd, J.M. and McKinney, K.L. (2016). Noise infusion as a confidentiality protection measure for graph-based statistics. Statistical Journal of the IAOS 32 (1): 127–135. https://doi.org/10.3233/SJI-160958.
2 Abowd, J.M. and Schmutte, I.M. (2015). Economic analysis and statistical disclosure limitation. Brookings Papers on Economic Activity 50 (1): 221–267.
3 Abowd, J.M. and Vilhuber, L. (2012). Did the housing price bubble clobber local labor market job and worker flows when it burst? The American Economic Review 102 (3): 589–593. https://doi.org/10.1257/aer.102.3.589.
4 Abowd, J.M., Haltiwanger, J., and Lane, J. (2004). Integrated longitudinal employer–employee data for the United States. The American Economic Review 94 (2): 224–229.
5 Abowd, J.M., Stinson, M., and Benedetto, G. (2006). Final Report to the Social Security Administration on the SIPP/SSA/IRS Public Use File Project. 1813/43929. U.S. Census