Administrative Records for Survey Methodology. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Administrative Records for Survey Methodology - Группа авторов страница 13

Administrative Records for Survey Methodology - Группа авторов

Скачать книгу

rel="nofollow" href="#ulink_73f1ded0-d2a9-5cea-a635-773cae2b2047">Figure 1.1 Second phase of integrated statistical micro data.

      Source: Zhang (2012).

      In a frame that is constructed from combining multiple population datasets, one can often find several related classification variables. Identification errors (Zhang 2012) arise if the classification variables or the relationships between them are mistaken based on the input datasets. For instance, the variable address is central for population and household statistics. Multiple addresses can be collected by combining the Population Register with resident address, the Post Register with postal address, the Higher-Education Student Register with term-time address, the various Utility datasets with occupant address, etc. Each person may be assigned a unique de jure address based on all these sources, in a way that is judged to be most appropriate, which would then yield a proxy variable for the de facto address that is of interest in many social-economic statistics.

      For survey data, the statistical unit can be identified in fieldwork. Based on register data, however, it is sometimes necessary to construct proxy statistical unit of interest, in which case unit errors may be unavoidable even if all the input data are error-free. For instance, consider register-based household. Provided all dwelling (or address) in the Population Register are correct, one may define a dwelling household to consist of all the persons who de jure share the same dwelling. We do not consider such a dwelling household to be a constructed statistical unit, precisely because it can be obtained from error-free input data directly. The perfection is another way of saying that there are no identification errors. An example of a constructed unit in this context is living household, which does not have to include everyone registered at the same dwelling nor be limited to these. Errors in a constructed living household is the case if two persons in different living households are placed in the same constructed living household, or if two persons in the same living households are placed in different constructed living households.

      Constructed or not, unit error can be the case whether it results from lack of data or errors in data. Zhang (2011) devises a mathematical representation of unit error. It is assumed that each statistical unit of interest can consist of one or several so-called base units, but never cuts across a base unit. For example, person can be the base unit for household. The mapping from the set of base units to the set of statistical units can then be specified in terms of an allocation matrix, where each element takes value 1 or 0 depending on whether or not the corresponding base unit (arranged by column) belong to the statistical unit (arranged by row). In the case where a base unit can be assigned to one and only one statistical unit, such as a person can only belong to one household, the column sum of the allocation matrix is always equal to 1. Zhang (2011) develops a unit error theory for household statistics. Despite the unit error clearly being one of the most fundamental difficulties in business statistics, a statistical theory has so far been lacking. This may be partly due to the prominence of the identification error mentioned above. Another important reason may simply be the lack of a commonly acknowledged choice of base unit in business statistics.

      1.2.2 Measurement

      Sometimes, however, all the available measures entail relevance error, regardless of the source of the data, and there does not exist a way in which they can be combined to derive the target measure directly. For instance, Meijer, Rohwedder, and Wansbeek (2012) adopt such a viewpoint and study earnings data in register and survey using a mixture model approach, whereas Pavlopoulos and Vermunt (2015) apply latent class models to analyze income-based labor market mobility. It is also possible to formulate an adjusted measure as the solution of an appropriately defined constrained optimization problem, without explicitly introducing a model that spells out the relationship between the true measure and the observed proxy measures. For instance, Mushkudiani, Daalmans, and Pannekoek (2014) apply such an approach to Census aggregated tables and turnover variable from different sources.

      Mapping error due to reclassification of input register data is highly common, since a register proxy variable often arises by means of reclassification. For instance, inferring the mother tongue from birth country is reclassification of the input variable birth country to the outcome variable mother tongue. For another example, to classify someone receiving unemployment benefit as unemployed is to reclassify the input variable benefit or not to the outcome variable unemployed or not. Examples as such are numerous.

      It is worth noting that mapping error may be caused by delays or mistakes in the administrative sources, even where reclassification has no conceptual difficulties. Register data may be progressive in the sense that the observations for a particular reference time point may differ depending on when the observations are compiled. According to Zhang and Fosen (2012) and Zhang and Pritchard (2013), let t be the reference time point of interest and t + d the measurement time point, for d ≥ 0. Let U(t) and y(t) be the target population and value at t, respectively. For a unit i, let Ii(t; t + d) = 1 if it is to be included in the target population and 0 otherwise, based on the register data available at t + d, and let yi(t; t + d) be the observed value for t at t + d. The data are said to be progressive if, for dd ′ > 0, one can have Ii(t; t + d) ≠ Ii(t; t + d′) and yi(t; t + d) ≠ yi(t; t + d′). Progressiveness is a distinct feature of register data

Скачать книгу