Administrative Records for Survey Methodology. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Administrative Records for Survey Methodology - Группа авторов страница 14
The observed proxy measures may need to be adjusted in order to satisfy micro- as well as macro-level constraints, so as to resolve incompatibility across the data sources. For instance, register data from corporate tax returns may be used to impute for the missing items in Structural Business Survey. If this results in numerical inconsistency with the items observed from the survey, then imputation or adjustment of some of the items will be necessary in order to produce a clean and coherent dataset. See e.g. Pannekoek, Shlomo, and DeWaal (2013) and Pannekoek and Zhang (2015) for relevant instances.
Imposing macro-level survey estimates as benchmarks, when micro-adjusting a register proxy variable, can be regarded as a means to achieve statistical relevance at the level where the unbiased benchmarks are introduced (Zhang and Giusti 2016), though one is unable to remove the relevance bias at the micro-level. The Norwegian register-based employment status provides an example of such uses of proxy variables. Initially, the register proxy variable is rule-processed based on several input administrative registers, covering employee benefit, self-employment, tax, military or civilian service, leave of absence, etc. This results in the tripartition of the target population: (I) the compatible part, where the register data are compatible across the sources and allow for unequivocal reclassification accordingly, (II) the resolved part, where reclassification can be determined after making room for administrative regulations and progressiveness of the data, (III) the unsolved part, where register data are either lacking or incompatible, beyond what can be rule-processed. The Labor Force Survey (LFS) estimate of the yearly total of employed is then introduced to define an income threshold in the different subsets of part (III), whereby everyone above the threshold is reclassified as employed, such that the register total of employed coincides with the LFS estimate. As shown by Fosen and Zhang (2011), the resulting adjusted register proxy variable entails smaller mean squared error at the municipality level, compared to the survey estimates where the register proxy is used as an auxiliary variable.
1.3 Estimation Using Multiple Proxy Variables
Within the context of combining register and survey data, we consider here multisource estimation methods that make use of two or more proxy variables. Deficiency of coverage, relevance, and timeliness is often the reason that register-based estimation is not viable. When the lack of coverage can be limited to specific domains or variables, the problem can be remedied by the collection of supplementary survey data using the split-population or split-data approach. There would be only one value for each variable of interest now that the data supplement each other. Different multisource estimation approaches are needed for multiple proxy variables.
We shall classify the various scenarios using two conditions summarized in Table 1.1. (i) Whether one treats one of the proxy variables as the target measure and the others as associated with relevance bias – to be referred to as the asymmetric setting; the setting is symmetric otherwise, where either none of the proxy variables is considered to be the ideal measure, or all are correct measures which nevertheless do not have perfect population coverage, (ii) Whether it is necessary to have linked data at the individual or cell level – to be referred to as the linked setting; the setting is unlinked otherwise. Each of the approaches listed in Table 1.1 covers a variety of methods with an extensive body of literature. The following elaboration aims merely to provide a brief accessible overview, and the references given serve only as points of departure for further exploration.
Table 1.1 Indirect estimation using register and survey proxy variables.
Linked data | One target measure and relevance bias in the others? | |
---|---|---|
Yes (asymmetric) | No (symmetric) | |
Yes (linked) | Survey weighting Prediction modeling | Capture–recapture methods Structural equation modeling |
No (unlinked) | Benchmark adjustment | Constrained optimization |
1.3.1 Asymmetric Setting
The two most common approaches under the asymmetric-linked setting are survey weighting and prediction modeling, where the register proxy variable is used as an auxiliary variable or a covariate. See e.g. Säarndal, Swensson, and Wretman (1992), for design-based approach to survey weighting that makes use of auxiliary variables; Valliant, Dorfman, and Royall (2000) and Chambers and Clark (2012) for model-based approach to finite population prediction; Rao and Molina (2015) for relevant methods of small area estimation. We make two observations. Firstly, when the overlapping survey variable is deemed necessary despite the presence of a register proxy, the latter is typically the most powerful among all the auxiliary variables when it comes to weighting adjustment and regression modeling. See e.g. Djerf (1997) and Thomsen and Zhang (2001) for the use of register economic activity status in the LFS, and the effects on reducing sampling and nonresponse errors. Secondly, applications to remedy Representation errors are much less common. However see, e.g. survey weighting under dependent sampling for the estimation of coverage errors (Nirel and Glickman 2009), mixed-effects models for assessing register coverage errors (Mancini and Toti 2014), and different misclassification models for register NACE (Van Delden et al. 2016), and register household (Zhang 2011).
The nature of a proxy variable implies a special use that is beyond what is feasible with a non-proxy auxiliary variable, no matter how good an auxiliary it is: provided suitable conditions, it is possible to substitute (or replace) the target measure by the proxy value. However, substitution would only be acceptable for a subset of the units but not all since, had it been acceptable for all the units, one would have had “direct tabulation” instead.
It follows that adjustment, or imputation in the case of a rejected value, will be necessary. Macro-level survey estimates can be imposed as benchmarks to achieve statistical relevance at the corresponding level. Linked datasets are typically not necessary here – recall the Norwegian register-based employment status described earlier. This yields many methods under what may be referred to as the benchmarked adjustment approach for combining register and survey proxy variables under the asymmetric-unlinked setting.
Repeated weighting and constrained (mass) imputation are two common approaches of benchmarked adjustment; see e.g. de Waal (2016) for a discussion. Repeated weighting is a technique initially presented for sample reweighing in the presence of overlapping survey estimates (Renssen and Nieuwenbroek 1997). It has been used for the reconciliation of Dutch virtual census output tables (Houbiers 2004). But it can equally be applied to adjust register datasets so that afterward, e.g. the weighted register proxy total agrees with the valid target totals imposed. This does not require linking the register datasets and the external datasets from which the benchmark totals are obtained. An inconvenience arises in cases where there are multiple proxy variables to be benchmarked and the variables are available for different subsets of units. This may be the case due to partial missing data in a single register file or when merging multiple register files. Some imputation will then be necessary if one would like to have a single set of weights for the whole dataset.
The one-number census imputation provides an example of the alternative imputation-based benchmarked adjustment methods (Brown et al. 1999). In the case of multiple proxy variables observed on different subsets of units, imputation is applied not only to the units with partially missing data, but also to the units with no observed variables at all, or possibly the units with completely