Biostatistics Decoded. A. Gouveia Oliveira
Чтение книги онлайн.
Читать онлайн книгу Biostatistics Decoded - A. Gouveia Oliveira страница 24
![Biostatistics Decoded - A. Gouveia Oliveira Biostatistics Decoded - A. Gouveia Oliveira](/cover_pre848547.jpg)
Then there is the sample variance, which is also not equal to the population variance. These quantities are usually represented by the symbols s2 and σ2, respectively. In the case of proportions, the sample and population variances should also be represented by s2 and σ2, but instead the general practice is to represent them by the formulas used for their computation. Therefore, the sample proportion is represented by p(1 − p) and the population variance by π(1 − π).
If one looks at the formulae for each of the above statistics, it becomes readily apparent why the sample statistics do not have the same value as their population counterparts. The reason is because they are computed differently, as shown in Figure 1.30.
Figure 1.30 Comparison of the computation of sample and population statistics.
Sample means also have variance, which is the square of the standard error, but the variance of sample means has neither a specific name, nor a specific notation.
From all of the above, we can conclude the following about sampling distributions:
Sample means have a normal distribution, regardless of the distribution of the attribute, but on the condition that they are large.
Small samples have a normal distribution only if the attribute has a normal distribution.
The mean of the sample means is the same as the population mean, regardless of the distribution of the variable or the sample size.
The standard deviation of the sample means, or standard error, is equal to the population standard deviation divided by the square root of the sample size, regardless of the distribution of the variable or the sample size.
Both the standard deviation and the standard error are measures of dispersion: the first measures the dispersion of the values of an attribute and the second measures the dispersion of the sample means of an attribute.
The above results are valid only if the observations in the sample are mutually independent.
1.19 The Value of the Standard Error
Let us continue to view, as in Section 1.17, the sample mean as a random variable that results from the sum of identically distributed independent variables. The mean and variance of each of these identical variables are, of course, the same as the population mean and variance, respectively μ and σ2.
When we compute sample means, we sum all observations and divide the result by the sample size. This is exactly the same as if, before we summed all the observations, we divided each one by the sample size. If we represent the sample mean by m, each observation by x, and the sample size by n, what was just said can be represented by
This is the same as if every one of the identical variables was divided by a constant amount equal to the sample size. From the properties of means, we know that if we divide a variable by a constant, its mean will be divided by the same constant. Therefore, the mean of each xi/n is equal to the population mean divided by n, that is, μ/n.
Now, from the properties of means we know that if we add independent variables, the mean of the resulting variable will be the sum of the means of the independent variables. Sample means result from adding together n variables, each one having a mean equal to μ/n. Therefore, the mean of the resulting variable will be n × μ/n = μ, the population mean. The conclusion, therefore, is that the distribution of sample means m has a mean equal to the population mean μ.
A similar reasoning may be used to find the value of the variance of sample means. We saw above that, to obtain a sample mean, we divide every single identical variable x by a constant, the sample size n. Therefore, according to the properties of variances, the variance of each identical variable xi/n will be equal to the population variance σ2 divided by the square of the sample size, that is, σ2/n2. Sample means result from adding together all the x. Consequently, the variance of the sample mean is equal to the sum of the variances of all the observations, that is, equal to n times the population variance divided by the square of the sample size:
This is equivalent to σ2/n, that is, the variance of the sample means is equal to the population variance divided by the sample size. Therefore, the standard deviation of the sample means (the standard error of the mean) equals the population standard deviation divided by the square root of the sample size.
One must not forget that these properties of means and variances only apply in the case of independent variables. Therefore, the results presented above will also only be valid if the sample consists of mutually independent observations. On the other hand, these results have nothing to do with the central limit theorem and, therefore, there are no restrictions related to the normality of the distribution or to the sample size. Actually, whatever the distribution of the attribute and the sample size might be, the mean of the sample means will always be the same as the population mean, and the standard error will always be the same as the population standard deviation divided by the square root of the sample size, provided that the observations are independent. The problem is that, in the case of small samples from an attribute with unknown distribution, we cannot assume that the sample means will have a normal distribution. Therefore, knowledge of the mean and of the standard error will not be sufficient to completely characterize the distribution of sample means.
1.20 Distribution of Sample Proportions
So far we have discussed the distribution of sample means of interval variables. What happens with sample means of binary variables when we take samples of a given size n from the same population?
We will repeat the experiment that was done for interval variables but now we will generate a random binary variable with probability 0.30 and take many samples of size 60. Of course, we will observe variation of the sample proportions as we did with sample means, as shown in Figure 1.31. As before, let us plot the sample proportions to see if there is a pattern for the distribution of their values.
The resulting graph is different from the one we obtained with sample means of interval variables. It clearly is not symmetrical, but above all the probability distribution is not continuous, it is discrete. Although resembling the normal distribution, this one is a different theoretical probability distribution, called the binomial distribution.
We shall now see how to create a binomial distribution. Imagine samples of four random observations on a binary attribute, such as sex, for example, which we know that the distribution in a population is equally divided between males and females. Each sample may have