Читать онлайн книгу - Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis

Скачать книгу

is E(y₁) = μ, E(y₂) = μ, … E(y_n) = μ, we can write

We note that the n values in numerator and denominator cancel, and so we end up with

Using the fact that E(y_i) = μ, we can also say that the expected value of a sampling distribution of the mean is equal to the mean of the population from which we did the theoretical sampling. That is, is true, since given , it stands that if we have say, five sample means , the expectation of each of these means should be equal to μ, from which we can easily deduce . That is, the mean of all the samples we could draw is equal to the population mean.

We now need a measure of the dispersion of a sampling distribution of the mean. At first glance, it may seem reasonable to assume that the variance of the sampling distribution of means should equal the variance of the population from which the sample means were drawn. However, this is not the case. What is true is that the variance of the sampling distribution of means will be equal to only a fraction of the population variance. It will be equal to of it, where n is equal to the size of samples we are collecting for each sample mean. Hence, the variance of means of the sampling distribution is equal to

or simply,

The mathematical proof of this statistical fact is in most mathematical statistics texts. A version of the proof can also be found in Hays (1994). The idea, however, can be easily and perhaps even more intuitively understood by recourse to what happens as n changes. We consider first the most trivial and unrealistic of examples to strongly demonstrate the point. Suppose that we calculate the sample mean from a sample size of n = 1, sampled from a population with μ = 10.0 and σ² = 2.0. Suppose the sample mean we obtain is equal to 4.0. Therefore, the sampling variance of the corresponding sampling distribution is equal to:

That is, the variance in means that you can expect to see if you sampled an infinite number of means based on samples of size n = 1 repeatedly from this population is equal to 2. Notice that 2 is exactly equal to the original population variance. In this case, the variance in means is based on only a single data point.

Consider now the case where n > 1. Suppose we now sampled a mean from the population based on sample size n = 2, yielding

What has happened? What has happened is that the variance in sample means has decreased by 1/2 of the original population variance (i.e., 1/2 of 2 is 1). Why is this decrease reasonable? It makes sense, because we already know from the law of large numbers that as the sample size grows larger, one gets closer and closer to the true probability in estimating a parameter. That is, for a consistent estimator, our estimate of the true population mean (i.e., the expectation) should get better and better as sample size increases. This is exactly what happens as we increase n, our precision of that which is being estimated increases. In other words, the sampling variance of the estimator decreases. It's less variable, it doesn't “bounce around as much” on average from sample to sample.

Analogous to how we defined the standard deviation as the square root of the variance, it is also useful to take the square root of the variance of means:

which we call the standard error of the mean, σ_M. The standard error of the mean is the standard deviation of the sampling distribution of the mean. Lastly, it is important to recognize that images is not “the” standard error. It is merely the standard error of the mean. Other statistics will have different SEs.

2.12 CENTRAL LIMIT THEOREM

It is not an exaggeration to say that the central limit theorem, in one form or another, is probably the most important and relevant theorem in theoretical statistics, which translates to it being quite relevant to applied statistics as well.

We borrow our definition of the central limit theorem from Everitt (2002):

If a random variable y has a population mean μ and population variance σ², then the sample mean, images , based on n observations, has an approximate normal distribution with mean μ and variance images , for sufficiently large n. (p. 64)

Asymptotically, the distribution of a normal random variable converges to that of a normal distribution as n → ∞. A multivariate version of the theorem can also be given (e.g., see Rencher, 1998, p. 53).⁷

The relevance and importance of the central limit theorem cannot be overstated: it allows one to know, at least on a theoretical level, what the distribution of a statistic (e.g., sample mean) will look like for increasing sample size. This is especially important if one is drawing samples from a population for which the shape is not known or is known a priori to be nonnormal. Normality of the sampling distribution, for adequate sample size, is still assured even if samples are drawn from nonnormal populations. Why is this relevant? It is relevant because if we know what the distribution of means will look like for increasing sample size, then we know we can compare our obtained statistic to a normal distribution in order to estimate its probability of occurrence. Normality assumptions are also typically required for assuming independence between images and s² in univariate contexts (Lukacs, 1942), and images (mean vector) and S (covariance matrix) in multivariate ones. When such estimators can be assumed to arise from normal or multivariate normal distributions (i.e., in the case of images and S) we can generally be assured one is independent of the other.

2.13 CONFIDENCE INTERVALS

Recall that a goal of statistical inference is to estimate functions of parameters, whether a single parameter, a difference of parameters (for instance,

Скачать книгу

Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis

Чтение книги онлайн.

Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 34

Информация о книге:

2.12 CENTRAL LIMIT THEOREM

2.13 CONFIDENCE INTERVALS