Administrative Records for Survey Methodology. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Administrative Records for Survey Methodology - Группа авторов страница 15
![Administrative Records for Survey Methodology - Группа авторов Administrative Records for Survey Methodology - Группа авторов](/cover_pre888004.jpg)
The distinction between weighting and imputation can be somewhat blurred when it comes to the adjustment of cross-classified proxy contingency tables, because an adjusted cell count is just the number of individuals with the corresponding cross-classification one would have in an imputed dataset. Take, e.g. a two-way table, where the rows represent population domains at some detailed level, say, by local area and sex-age group, and the columns a composition of interest, say, income class. Let X denote the table based on combining population and tax register data. Let Y(r) denote the known vector of population domain sizes, and let
A key difference between the asymmetric-linked setting and the asymmetric-unlinked setting discussed above is that, one generally does not expect a benchmarked adjustment method based on unlinked data to yield unbiased results below the level where the benchmarks are imposed. For instance, repeated weighting of Renssen and Nieuwenbroek (1997) can yield design-consistent domain estimates subjected to population benchmark totals, because the overlapping survey variables are both considered as the target measure here and no relevance bias is admitted. However, when the same technique is applied to reweight a register dataset, e.g. with the initial weights all set to 1, one cannot generally claim design or model-based consistency below the level of the imposed benchmarks, regardless of whether the benchmarks themselves are true or unbiased from either the design- or model-based perspective. Similarly, provided suitable assumptions, the one-number census imputation can yield model-consistent estimates below the level of the imposed constraints, because the donor records are taken from the enumerated census records that are considered to provide the target measures. However, the model-consistency would fall apart when the donor pool is a register dataset that suffers from relevance bias, even if all the other “suitable” assumptions are retained. Assessment of the statistical uncertainty associated with benchmarked adjustment is therefore an important research topic. An illustration in the contingency table case will now be given in Section 1.3.2.
1.3.2 Uncertainty Evaluation: A Case of Two-Way Data
Let a = 1, …, A and j = 1, …, J form a two-way classification of interest. For example, a may stand for ethnicity (White, Black, and Others), and j election votes for party (Democratic, Republic, Others). Or, let a be the index of a large number of local areas, and j the different household types such as single-person, couple without children, couple with children, etc. Let X = {Xaj} be a known register-based proxy table that is unacceptable as “direct tabulation” of the target table Y = {Yaj}.
For the asymmetric-linked setting, suppose there is available an observed sample two-way classification of (a, j). For survey weighting, let s denote the sample and let di = 1/πi be the sampling weight of unit i ∈ s, where πi is the inclusion probability. Let yi(a, j) = 1 if sample unit i ∈ s has classification (a, j) according to the target measure and yi(a, j) = 0 otherwise; let xi(a, j) = 1 if it has classification (a, j) according to the proxy measure and xi(a, j) = 0 otherwise. Post-stratification with respect to X yields then the poststratification weight, say,
This is problematic when there are empty and very small sample cells of (a, j). Raking ratio weight can then be given by
A drawback of the weighting approach above is that no estimate of Yaj will be available in the case of empty sample cell (a, j), and the estimate will have a large sampling variance when the sample cell (a, j) is small in size. This is typically the situation in small area estimation, where, e.g. a is the index of a large number of local areas. Zhang and Chambers (2004) and Luna-Hernández (2016) develop prediction modeling approach.
The within-area composition (Ya1, Ya2, …, YaJ) is related to the corresponding proxy composition (Xa1, Xa2, …, XaJ) by means of a structural equation
where
The structural equation can be used to specify a generalized linear model of the observed sample cell counts, or their weighted totals, which allows one to estimate β and Y. It is further possible to develop the mixed-effects modeling approach that is popular in small area estimation, by introducing the mixed structural equation
with the same quantities and the additional random effects ua = (ua1,