Sampling and Estimation from Finite Populations. Yves Tille
Чтение книги онлайн.
Читать онлайн книгу Sampling and Estimation from Finite Populations - Yves Tille страница 14
![Sampling and Estimation from Finite Populations - Yves Tille Sampling and Estimation from Finite Populations - Yves Tille](/cover_pre848292.jpg)
Drawing on the fact that the random sample is an “ancillary” statistic, Royall proposed to work conditionally on it. In other words, he considered that once the sample is selected, the choice of units is no longer random. This new modeling allowed the development of a particular research school. The model must express a known and previously accepted relationship. According to Royall, if the superpopulation model “adequately” describes the population, the inference can be conducted only with respect to the model, conditional to the sample selection. The use of the model then allows us to determine an optimal estimator.
One can object that a model is always an approximate representation of the population. However, the model is not built to be tested for data but to “assist” the estimation. If the model is correct, then Royall's method will provide a powerful estimator. If the model is false, the bias may be so important that the confidence intervals built for the parameter are not valid. This is essentially the critique stated by Hansen et al. (1983).
The debate is interesting because the arguments are not in the domain of mathematical statistics. Mathematically, these two theories are obviously correct. The argument relates to the adequacy of formalization to reality and is therefore necessarily external to the mathematical aspect of statistical development. In addition, the modeling proposed by Royall is particular. Above all, it makes it possible to break a theoretical impasse and therefore provide optimal estimators. However, the relevance of modeling is questionable and will be considered in a completely different way depending on whether one takes the arguments of sociology, demography or econometrics, three disciplines that are intimately related to the methodology of statistics. A comment from Dalenius (see Hansen et al., 1983, p. 800) highlights this problem:
That is not to say that the arguments for or against parametric inference in the usual statistical theory are not of interest in the context of the theory of survey sampling. In our assessment of these arguments, however, we must pay attention to the relevant specifics of the applications.
According to Dalenius, it is therefore in the discipline in which the theory of survey sampling is applied that useful conclusions should be drawn concerning the adequacy of a superpopulation model.
The statistical theory of surveys mainly applies in official statistics institutes. These institutes do not develop a science but have a mission from their states. There is a fairly standard argument by the heads of national statistical institutes: the use of a superpopulation model in an estimation procedure is a breach of a principle of impartiality which is part of the ethics of statisticians. This argument comes directly from the current definition of official statistics. The principle of impartiality is part of this definition as the principle of accuracy was part of it in the 19th century. If modeling a population is easily conceived as a research tool or as a predictive tool, it remains fundamentally questionable in the field of official statistics.
1.8 Attempt to a Synthesis
The “superpopulation” approach has led to extremely fruitful research. The development of a hybrid approach called the model‐assisted approach allows valid inferences to be provided under the model but is also robust when the model is wrong. This view was mainly developed by a Swedish school (see Särndal et al., 1992). The model allows us to take into account auxiliary information at the time of estimation while preserving properties of robustness for the estimators in the event of nonadequacy of the model. It is actually very difficult to construct an estimator that takes into account a set of auxiliary information after the selection of the sample without making a hypothesis, even a simple one, on the relation existing between the auxiliary information and the variable of interest. The modeling allows a conceptualization of this type of presumption. The model‐assisted approach allows us to construct interesting and practical estimators. It is now clear that the introduction of a model is a necessity for dealing with some nonresponse and estimation problems in small areas. In this type of problem, whatever the technique used, one always postulates the existence of a model even if sometimes this is implicit. The model also deserves to be clearly determined in order to explain the underlying ideas that justify the application of the method.
1.9 Auxiliary Information
The 1990s were marked by the emergence of the concept of auxiliary information. This relatively general notion includes all information external to the survey itself used to increase the accuracy of the results of a survey. This information can be the knowledge of the values of one or more variables on all the units of the population or simply a function of these values. For most surveys, auxiliary information is available. It can be given by a census or simply by the sampling frame. Examples of auxiliary information include the total of a variable on the population, subtotals according to subpopulations, averages, proportions, variances, and values of a variable on all the units of the sampling frame. Therefore, the notion of auxiliary information encompasses all data from censuses or administrative sources.
The main objective is to use all this information to obtain accurate results. As shown in Figure 1.1, the auxiliary information can be used during the conception of the sampling design and at the time of the estimation of the parameters. When auxiliary information is used to construct the sampling design, a design is sought that provides accurate estimators for a given price or is inexpensive for given accuracy criteria. For these reasons, one can use unequal, stratified, balanced, clustered or multi‐stage probability designs. When information is used in the estimation stage, it is used to “calibrate” the results of the census auxiliary information to the survey. The general method of calibration (in French, calage) of Deville & Särndal (1992) allows auxiliary information to be used without explicit references to a model.
Figure 1.1 Auxiliary information can be used before or after data collection to improve estimations.
Any sampling problem deals with how to use the available information. With the idea of auxiliary information, one frees oneself from the modeling of the population. This concept, which will be the main theme of this book, allows us to think of the problem of planning and estimation in an integrated way.
1.10 Recent References and Development
The books of precursors are Yates (1946, 1949, 1960, 1979), Deming (1948, 1950, 1960), Thionet (1953), Sukhatme (1954), Hansen et al. (1953a,b), Cochran (1953, 1963, 1977), Dalenius (1957), Kish (1965, 1989, 1995), Murthy (1967), Raj (1968), Johnson & Smith (1969), Sukhatme & Sukhatme (1970), Konijn (1973), Lanke (1975), Cassel et al. (1977, 1993), Jessen (1978), Hájek (1981), and Kalton (1983). These books are worth consulting because many modern ideas, especially on calibration and balancing, are discussed in them.
Important reference works include Skinner et al. (1989), Särndal et al. (1992), Lohr (1999, 2009b), Thompson (1997), Brewer (2002), Ardilly & Tillé (2006), and Fuller (2011). The series Handbook of Statistics, which is devoted to sampling, was published at 15‐year intervals. First, a volume headed by Krishnaiah & Rao (1994), then two volumes headed by Pfeffermann & Rao (2009a,b). There is also a recent collective work led by Wolf et al. (2016).
The works of Thompson (1992, 1997, 2012) and