Administrative Records for Survey Methodology. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Administrative Records for Survey Methodology - Группа авторов страница 16

Administrative Records for Survey Methodology - Группа авторов

Скачать книгу

where
. The associated uncertainty will now be evaluated under the postulated model. The prediction modeling approach can thus improve on the survey weighting approach in the presence of empty and very small sample cells.

      For an example under the asymmetric-unlinked setting, consider the Norwegian register-based household statistics. At the time the household register was first introduced for the year 2005, there were still about 6% persons with missing dwelling identification in the Central Population Register. As the missing rate differed by local areas as well as household types, direct tabulation did not yield acceptable results compared to the Census 2001 outputs. The IPF was applied to the sub-population of households that have the dwelling identification to yield a weight for every such household. The method falls under the benchmarked adjustment approach. However, direct evaluation of the associated uncertainty is not straightforward. Zhang (2009b) extends the prediction modeling approach above to accommodate the informative missing data. By comparison with the model-based predictions, one is able to assess indirectly the benchmarked adjustment results.

      Using the IPF for small area estimation is known as structure preserving estimation (SPREE, Purcell and Kish 1980). The model underpinning the SPREE is a special case of the prediction models mentioned above, i.e. by setting β = 1. It does not require linkage between the proxy data X and the data that yield the benchmarks Ya+ and Y+i. While this is convenient for deriving the estimates, a difficulty arises when it comes to uncertainty evaluation directly under the SPREE model. See also Dostál et al. (2016) for a benchmarked adjustment method based on the chi-squared measure in this respect.

      Finally, let Y by ethnicity and party votes be the table of interest. Suppose one can obtain

and Y+j in an election, but there are no joint observations of the cells (a, j). This can be framed as a problem of statistical matching. Provided a proxy table X, say, ethnicity by party membership, the IPF can be applied to obtain an estimated table
. Zhang (2015a) develop an uncertainty measure that combines the identification uncertainty and the sampling uncertainty in this context, which enables one to quantify the relative efficiency of the proxy data X, compared to statistical matching without X. The application of the IPF here is an example of the benchmarked adjustment approach.

      1.3.3 Symmetric Setting

      In the symmetric setting none of the proxy variables is ideal due to errors of relevance, measurement, or coverage. The two most common approaches under the symmetric-linked setting are capture–recapture methodology for population size estimation and Structural Equation Modeling (SEM) that covers the latent class models mentioned earlier.

      Combining survey and register-based enumerations for population size estimation has attracted growing interest in the recent years, under the assumption that none of the sources can yield the true target population enumeration directly. We refer to the Journal of Official Statistics (2015, vol. 31, issue 3) for several useful references in this regard. There is plenty of scope for developing a range of models in order to address the different problems, including erroneous enumerations that are not dealt with in the traditional capture–recapture methodology. The potential impact can be huge if it enables one to produce census-like population statistics without the traditional census.

      SEM is often considered to have evolved from the genetic path modeling of Sewall Wright. See, e.g. Kline (2016) for a general introduction. The approach is popular in many social science disciplines that share a common interest in “latent constructs” such as intelligence, attitude, well-being, living standard, and so on. The postulated latent constructs cannot be measured directly and are only manifested through observable indicators. The SEM consists of two main components: the structural model showing potentially causal dependencies among the latent variables, and the measurement model relating the latent variables and their indicators. The approach can be referred to in different ways depending on the continuous-categorical nature of the variables involved, the presence of causality or stochastic process on the latent level, etc.

      The SEM approach is applicable under the symmetric-linked setting, where the proxy variables are treated as the indicators of the unobserved target measure. In the context of combining register and survey data, this can serve a number of purposes, including assessing potential relevance bias of proxy measures, detecting and possible treatment of measurement errors in editing and estimation, and statistical analysis of latent relationships using proxy indicators. For examples of data types that have been studied recently, see e.g. Pavlopoulos and Vermunt (2015) for temporary employment, Guarnera and Varriale (2015) for labor cost, and Burger et al. (2015) for turnover.

      Multiple macro-level proxy totals may need to be reconciled under the symmetric-unlinked setting. A typical example is multiple time series with different frequencies, e.g. with register-based yearly figures and survey-based sub-annual figures. Another example is the Supply-and-Use Tables for the production of GDP, where the initial estimates generally do not balance out because they are derived from different sources, or when the GDP is compiled using different approaches. Census output tables derived from fragmented data sources instead a one-number file is yet another example. See e.g. Bikker, Daalmans, and Mushkudiani (2013) and Mushkudiani, Daalmans, and Pannekoek (2014, 2015).

      Reconciliation is often achieved as the solution to a constrained optimization problem. The approach requires the specification of two components. A loss function may be defined to measure the changes from the initial proxy estimates to the final reconciled estimates. The constraints that the final estimates must satisfy need to be explicitly stated, which may contain both equality and inequality constraints. Minimizing the loss function subjected to the constraints would then yield the final estimates. The approach is feasible without linked data across the sources. Notice that there are many advanced techniques of constrained optimization in Applied Mathematics, Engineering, and Computer Sciences.

      Mushkudiani, Pannekoek, and Zhang (2016) develop scalar uncertainty measure of macro

Скачать книгу