Biostatistics Decoded. A. Gouveia Oliveira

Чтение книги онлайн.

Читать онлайн книгу Biostatistics Decoded - A. Gouveia Oliveira страница 24

Biostatistics Decoded - A. Gouveia Oliveira

Скачать книгу

error.

      Then there is the sample variance, which is also not equal to the population variance. These quantities are usually represented by the symbols s2 and σ2, respectively. In the case of proportions, the sample and population variances should also be represented by s2 and σ2, but instead the general practice is to represent them by the formulas used for their computation. Therefore, the sample proportion is represented by p(1 − p) and the population variance by π(1 − π).

An illustration of the comparison of the computation of sample and population statistics.

      Sample means also have variance, which is the square of the standard error, but the variance of sample means has neither a specific name, nor a specific notation.

      From all of the above, we can conclude the following about sampling distributions:

       Sample means have a normal distribution, regardless of the distribution of the attribute, but on the condition that they are large.

       Small samples have a normal distribution only if the attribute has a normal distribution.

       The mean of the sample means is the same as the population mean, regardless of the distribution of the variable or the sample size.

       The standard deviation of the sample means, or standard error, is equal to the population standard deviation divided by the square root of the sample size, regardless of the distribution of the variable or the sample size.

       Both the standard deviation and the standard error are measures of dispersion: the first measures the dispersion of the values of an attribute and the second measures the dispersion of the sample means of an attribute.

       The above results are valid only if the observations in the sample are mutually independent.

      Let us continue to view, as in Section 1.17, the sample mean as a random variable that results from the sum of identically distributed independent variables. The mean and variance of each of these identical variables are, of course, the same as the population mean and variance, respectively μ and σ2.

      When we compute sample means, we sum all observations and divide the result by the sample size. This is exactly the same as if, before we summed all the observations, we divided each one by the sample size. If we represent the sample mean by m, each observation by x, and the sample size by n, what was just said can be represented by

equation

      This is the same as if every one of the identical variables was divided by a constant amount equal to the sample size. From the properties of means, we know that if we divide a variable by a constant, its mean will be divided by the same constant. Therefore, the mean of each xi/n is equal to the population mean divided by n, that is, μ/n.

      Now, from the properties of means we know that if we add independent variables, the mean of the resulting variable will be the sum of the means of the independent variables. Sample means result from adding together n variables, each one having a mean equal to μ/n. Therefore, the mean of the resulting variable will be n × μ/n = μ, the population mean. The conclusion, therefore, is that the distribution of sample means m has a mean equal to the population mean μ.

      A similar reasoning may be used to find the value of the variance of sample means. We saw above that, to obtain a sample mean, we divide every single identical variable x by a constant, the sample size n. Therefore, according to the properties of variances, the variance of each identical variable xi/n will be equal to the population variance σ2 divided by the square of the sample size, that is, σ2/n2. Sample means result from adding together all the x. Consequently, the variance of the sample mean is equal to the sum of the variances of all the observations, that is, equal to n times the population variance divided by the square of the sample size:

equation

      This is equivalent to σ2/n, that is, the variance of the sample means is equal to the population variance divided by the sample size. Therefore, the standard deviation of the sample means (the standard error of the mean) equals the population standard deviation divided by the square root of the sample size.

      So far we have discussed the distribution of sample means of interval variables. What happens with sample means of binary variables when we take samples of a given size n from the same population?

      The resulting graph is different from the one we obtained with sample means of interval variables. It clearly is not symmetrical, but above all the probability distribution is not continuous, it is discrete. Although resembling the normal distribution, this one is a different theoretical probability distribution, called the binomial distribution.

      We shall now see how to create a binomial distribution. Imagine samples of four random observations on a binary attribute, such as sex, for example, which we know that the distribution in a population is equally divided between males and females. Each sample may have

Скачать книгу