Читать онлайн книгу - Administrative Records for Survey Methodology. Группа авторов. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Administrative Records for Survey Methodology - Группа авторов

Скачать книгу

is a complete dataset that guarantees numerical consistency for any tabulation across the variables and population domains. Constrained imputation for population datasets are e.g. discussed by Shlomo, de Waal, and Pannekoek (2009) and Zhang (2009a). Methods that incorporate micro-data edit constraints are e.g. studied in Coutinho, de Waal, and Shlomo (2013), Pannekoek, Shlomo, and DeWaal (2013), and Pannenkoek and Zhang (2015). Chambers and Ren (2004) consider a method of benchmarked outlier robust imputation. Obviously, it may be difficult to generate a single population dataset that is fit for all possible statistical uses. de Waal (2016) discusses the use of “repeated imputation.” Notice that there are many relevant works on the generation of benchmarked synthetic populations in Spatial Demography, Econometrics, and Sociology.

The distinction between weighting and imputation can be somewhat blurred when it comes to the adjustment of cross-classified proxy contingency tables, because an adjusted cell count is just the number of individuals with the corresponding cross-classification one would have in an imputed dataset. Take, e.g. a two-way table, where the rows represent population domains at some detailed level, say, by local area and sex-age group, and the columns a composition of interest, say, income class. Let X denote the table based on combining population and tax register data. Let Y^(r) denote the known vector of population domain sizes, and let

denote the survey-based estimates of population totals by income class, which are the row and column benchmarks of the target table Y, respectively. Starting with X and by means of iterative proportional fitting (IPF) until convergence, one may obtain a table

that sums to both Y^(r) and

marginally. The technique has many applications including small area estimation (Purcell and Kish 1980) and statistical matching (D’Orazio, Di Zio, and Scanu 2006; Zhang 2015a) – more in Section 1.3.2.

A key difference between the asymmetric-linked setting and the asymmetric-unlinked setting discussed above is that, one generally does not expect a benchmarked adjustment method based on unlinked data to yield unbiased results below the level where the benchmarks are imposed. For instance, repeated weighting of Renssen and Nieuwenbroek (1997) can yield design-consistent domain estimates subjected to population benchmark totals, because the overlapping survey variables are both considered as the target measure here and no relevance bias is admitted. However, when the same technique is applied to reweight a register dataset, e.g. with the initial weights all set to 1, one cannot generally claim design or model-based consistency below the level of the imposed benchmarks, regardless of whether the benchmarks themselves are true or unbiased from either the design- or model-based perspective. Similarly, provided suitable assumptions, the one-number census imputation can yield model-consistent estimates below the level of the imposed constraints, because the donor records are taken from the enumerated census records that are considered to provide the target measures. However, the model-consistency would fall apart when the donor pool is a register dataset that suffers from relevance bias, even if all the other “suitable” assumptions are retained. Assessment of the statistical uncertainty associated with benchmarked adjustment is therefore an important research topic. An illustration in the contingency table case will now be given in Section 1.3.2.

1.3.2 Uncertainty Evaluation: A Case of Two-Way Data

Let a = 1, …, A and j = 1, …, J form a two-way classification of interest. For example, a may stand for ethnicity (White, Black, and Others), and j election votes for party (Democratic, Republic, Others). Or, let a be the index of a large number of local areas, and j the different household types such as single-person, couple without children, couple with children, etc. Let X = {X_aj} be a known register-based proxy table that is unacceptable as “direct tabulation” of the target table Y = {Y_aj}.

For the asymmetric-linked setting, suppose there is available an observed sample two-way classification of (a, j). For survey weighting, let s denote the sample and let d_i = 1/π_i be the sampling weight of unit i ∈ s, where π_i is the inclusion probability. Let y_i(a, j) = 1 if sample unit i ∈ s has classification (a, j) according to the target measure and y_i(a, j) = 0 otherwise; let x_i(a, j) = 1 if it has classification (a, j) according to the proxy measure and x_i(a, j) = 0 otherwise. Post-stratification with respect to X yields then the poststratification weight, say,

, where

This is problematic when there are empty and very small sample cells of (a, j). Raking ratio weight can then be given by

, where

is derived by the IPF of

to row and column totals X_a+ and X_+j, respectively. Deville, Särndal, and Sautory (1993) provide approximate variance of the raking ratio estimator, say,

where

A drawback of the weighting approach above is that no estimate of Y_aj will be available in the case of empty sample cell (a, j), and the estimate will have a large sampling variance when the sample cell (a, j) is small in size. This is typically the situation in small area estimation, where, e.g. a is the index of a large number of local areas. Zhang and Chambers (2004) and Luna-Hernández (2016) develop prediction modeling approach.

The within-area composition (Y_a1, Y_a2, …, Y_aJ) is related to the corresponding proxy composition (X_a1, X_a2, …, X_aJ) by means of a structural equation

where

is the area-vector of interactions on the log scale, i.e.

where

, and similarly for

, and β a matrix of unknown coefficients that sum to zero by row and by column.

The structural equation can be used to specify a generalized linear model of the observed sample cell counts, or their weighted totals, which allows one to estimate β and Y. It is further possible to develop the mixed-effects modeling approach that is popular in small area estimation, by introducing the mixed structural equation

with the same quantities and the additional random effects u_a = (u_a1,

Скачать книгу

Administrative Records for Survey Methodology. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Administrative Records for Survey Methodology - Группа авторов страница 15

Информация о книге:

1.3.2 Uncertainty Evaluation: A Case of Two-Way Data