Sampling and Estimation from Finite Populations. Yves Tille

Чтение книги онлайн.

Читать онлайн книгу Sampling and Estimation from Finite Populations - Yves Tille страница 14

Sampling and Estimation from Finite Populations - Yves Tille

Скачать книгу

model defines a class of distributions to which these random variables are supposed to belong. The sample is then derived from a double random experiment: a realization of the model that generates the population and then the choice of the sample. The idea of modeling the population was present in Brewer (1963a), but it was developed by Royall (1970b, 1971, 1976b) (see also Valliant et al., 2000; Chambers & Clark, 2012).

      Drawing on the fact that the random sample is an “ancillary” statistic, Royall proposed to work conditionally on it. In other words, he considered that once the sample is selected, the choice of units is no longer random. This new modeling allowed the development of a particular research school. The model must express a known and previously accepted relationship. According to Royall, if the superpopulation model “adequately” describes the population, the inference can be conducted only with respect to the model, conditional to the sample selection. The use of the model then allows us to determine an optimal estimator.

      One can object that a model is always an approximate representation of the population. However, the model is not built to be tested for data but to “assist” the estimation. If the model is correct, then Royall's method will provide a powerful estimator. If the model is false, the bias may be so important that the confidence intervals built for the parameter are not valid. This is essentially the critique stated by Hansen et al. (1983).

      That is not to say that the arguments for or against parametric inference in the usual statistical theory are not of interest in the context of the theory of survey sampling. In our assessment of these arguments, however, we must pay attention to the relevant specifics of the applications.

      According to Dalenius, it is therefore in the discipline in which the theory of survey sampling is applied that useful conclusions should be drawn concerning the adequacy of a superpopulation model.

      The statistical theory of surveys mainly applies in official statistics institutes. These institutes do not develop a science but have a mission from their states. There is a fairly standard argument by the heads of national statistical institutes: the use of a superpopulation model in an estimation procedure is a breach of a principle of impartiality which is part of the ethics of statisticians. This argument comes directly from the current definition of official statistics. The principle of impartiality is part of this definition as the principle of accuracy was part of it in the 19th century. If modeling a population is easily conceived as a research tool or as a predictive tool, it remains fundamentally questionable in the field of official statistics.

      The “superpopulation” approach has led to extremely fruitful research. The development of a hybrid approach called the model‐assisted approach allows valid inferences to be provided under the model but is also robust when the model is wrong. This view was mainly developed by a Swedish school (see Särndal et al., 1992). The model allows us to take into account auxiliary information at the time of estimation while preserving properties of robustness for the estimators in the event of nonadequacy of the model. It is actually very difficult to construct an estimator that takes into account a set of auxiliary information after the selection of the sample without making a hypothesis, even a simple one, on the relation existing between the auxiliary information and the variable of interest. The modeling allows a conceptualization of this type of presumption. The model‐assisted approach allows us to construct interesting and practical estimators. It is now clear that the introduction of a model is a necessity for dealing with some nonresponse and estimation problems in small areas. In this type of problem, whatever the technique used, one always postulates the existence of a model even if sometimes this is implicit. The model also deserves to be clearly determined in order to explain the underlying ideas that justify the application of the method.

      The 1990s were marked by the emergence of the concept of auxiliary information. This relatively general notion includes all information external to the survey itself used to increase the accuracy of the results of a survey. This information can be the knowledge of the values of one or more variables on all the units of the population or simply a function of these values. For most surveys, auxiliary information is available. It can be given by a census or simply by the sampling frame. Examples of auxiliary information include the total of a variable on the population, subtotals according to subpopulations, averages, proportions, variances, and values of a variable on all the units of the sampling frame. Therefore, the notion of auxiliary information encompasses all data from censuses or administrative sources.

Block diagram of the flow of auxiliary information, which can be via sampling design and estimation. Data collection brings sampling design to estimation.

      The books of precursors are Yates (1946, 1949, 1960, 1979), Deming (1948, 1950, 1960), Thionet (1953), Sukhatme (1954), Hansen et al. (1953a,b), Cochran (1953, 1963, 1977), Dalenius (1957), Kish (1965, 1989, 1995), Murthy (1967), Raj (1968), Johnson & Smith (1969), Sukhatme & Sukhatme (1970), Konijn (1973), Lanke (1975), Cassel et al. (1977, 1993), Jessen (1978), Hájek (1981), and Kalton (1983). These books are worth consulting because many modern ideas, especially on calibration and balancing, are discussed in them.

      Important reference works include Skinner et al. (1989), Särndal et al. (1992), Lohr (1999, 2009b), Thompson (1997), Brewer (2002), Ardilly & Tillé (2006), and Fuller (2011). The series Handbook of Statistics, which is devoted to sampling, was published at 15‐year intervals. First, a volume headed by Krishnaiah & Rao (1994), then two volumes headed by Pfeffermann & Rao (2009a,b). There is also a recent collective work led by Wolf et al. (2016).

      The works of Thompson (1992, 1997, 2012) and

Скачать книгу