Administrative Records for Survey Methodology. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Administrative Records for Survey Methodology - Группа авторов страница 15

Administrative Records for Survey Methodology - Группа авторов

Скачать книгу

is a complete dataset that guarantees numerical consistency for any tabulation across the variables and population domains. Constrained imputation for population datasets are e.g. discussed by Shlomo, de Waal, and Pannekoek (2009) and Zhang (2009a). Methods that incorporate micro-data edit constraints are e.g. studied in Coutinho, de Waal, and Shlomo (2013), Pannekoek, Shlomo, and DeWaal (2013), and Pannenkoek and Zhang (2015). Chambers and Ren (2004) consider a method of benchmarked outlier robust imputation. Obviously, it may be difficult to generate a single population dataset that is fit for all possible statistical uses. de Waal (2016) discusses the use of “repeated imputation.” Notice that there are many relevant works on the generation of benchmarked synthetic populations in Spatial Demography, Econometrics, and Sociology.

denote the survey-based estimates of population totals by income class, which are the row and column benchmarks of the target table Y, respectively. Starting with X and by means of iterative proportional fitting (IPF) until convergence, one may obtain a table
that sums to both Y(r) and
marginally. The technique has many applications including small area estimation (Purcell and Kish 1980) and statistical matching (D’Orazio, Di Zio, and Scanu 2006; Zhang 2015a) – more in Section 1.3.2.

      A key difference between the asymmetric-linked setting and the asymmetric-unlinked setting discussed above is that, one generally does not expect a benchmarked adjustment method based on unlinked data to yield unbiased results below the level where the benchmarks are imposed. For instance, repeated weighting of Renssen and Nieuwenbroek (1997) can yield design-consistent domain estimates subjected to population benchmark totals, because the overlapping survey variables are both considered as the target measure here and no relevance bias is admitted. However, when the same technique is applied to reweight a register dataset, e.g. with the initial weights all set to 1, one cannot generally claim design or model-based consistency below the level of the imposed benchmarks, regardless of whether the benchmarks themselves are true or unbiased from either the design- or model-based perspective. Similarly, provided suitable assumptions, the one-number census imputation can yield model-consistent estimates below the level of the imposed constraints, because the donor records are taken from the enumerated census records that are considered to provide the target measures. However, the model-consistency would fall apart when the donor pool is a register dataset that suffers from relevance bias, even if all the other “suitable” assumptions are retained. Assessment of the statistical uncertainty associated with benchmarked adjustment is therefore an important research topic. An illustration in the contingency table case will now be given in Section 1.3.2.

      1.3.2 Uncertainty Evaluation: A Case of Two-Way Data

      For the asymmetric-linked setting, suppose there is available an observed sample two-way classification of (a, j). For survey weighting, let s denote the sample and let di = 1/πi be the sampling weight of unit is, where πi is the inclusion probability. Let yi(a, j) = 1 if sample unit is has classification (a, j) according to the target measure and yi(a, j) = 0 otherwise; let xi(a, j) = 1 if it has classification (a, j) according to the proxy measure and xi(a, j) = 0 otherwise. Post-stratification with respect to X yields then the poststratification weight, say,

, where

      This is problematic when there are empty and very small sample cells of (a, j). Raking ratio weight can then be given by

, where
is derived by the IPF of
to row and column totals Xa+ and X+j, respectively. Deville, Särndal, and Sautory (1993) provide approximate variance of the raking ratio estimator, say,
where

      A drawback of the weighting approach above is that no estimate of Yaj will be available in the case of empty sample cell (a, j), and the estimate will have a large sampling variance when the sample cell (a, j) is small in size. This is typically the situation in small area estimation, where, e.g. a is the index of a large number of local areas. Zhang and Chambers (2004) and Luna-Hernández (2016) develop prediction modeling approach.

      The within-area composition (Ya1, Ya2, …, YaJ) is related to the corresponding proxy composition (Xa1, Xa2, …, XaJ) by means of a structural equation

      where

is the area-vector of interactions on the log scale, i.e.
where
=
, and similarly for
, and β a matrix of unknown coefficients that sum to zero by row and by column.

      The structural equation can be used to specify a generalized linear model of the observed sample cell counts, or their weighted totals, which allows one to estimate β and Y. It is further possible to develop the mixed-effects modeling approach that is popular in small area estimation, by introducing the mixed structural equation

Скачать книгу