Medical Statistics. David Machin
Чтение книги онлайн.
Читать онлайн книгу Medical Statistics - David Machin страница 36
![Medical Statistics - David Machin Medical Statistics - David Machin](/cover_pre843611.jpg)
The Normal distribution also has other uses in statistics and is often used as an approximation to the Binomial and Poisson distributions. Figure 4.4 shows that the Binomial distribution for any particular value of the parameter π approaches the shape of a Normal distribution as the other parameter n increases. The approach to Normality is more rapid for values of π near 0.5 than for values near to 0 or 1. Thus, provided n is large enough, a count may be regarded as approximately Normally distributed with mean nπ and
4.6 Reference Ranges
Diagnostics tests use patient data to classify individuals as either normal or abnormal. A related statistical problem is the description of the variability in normal individuals, to provide a basis for assessing the test results of other individuals. The most common form of presenting such data is as a range of values or interval that contains the values obtained from the majority of a sample of normal subjects. The reference interval is often referred to as a normal range or reference range. To distinguish the use of the same word for the Normal distribution we have used a lower case, for the normal range, and upper case convention throughout this book.
Worked Example – Reference Range – Birthweight
We can use the fact that our sample birthweight data, from the O'Cathain et al. (2002) study (see Figure 4.9); appear Normally distributed to calculate a reference range for birthweights. We have already mentioned that about 95% of the observations from a Normal distribution lie within 1.96 SDs either side of the mean. So a reference range obtained from this sample of babies is:
If the baby data were not Normally distributed then the normal reference range is obtained from the calculated percentiles of the sample as described in Chapter 2. Thus the 2.5 percentile corresponds to 2.5% of the babies below this weight which equals 2.91 kg. Correspondingly the estimated 97.5 percentile suggests that only 2.5% of babies are heavier than 4.43 kg at birth. The percentile‐based reference range for baby birthweight is therefore estimated to be 2.19 to 4.43 kg. This is very close to that obtained when we assume the birthweight has a Normal distribution.
Most reference ranges are based on samples larger than 3500 people. Over many years, and millions of births, the World Health Organization (WHO) has come up with a normal birthweight range for new‐born babies. These ranges represent results than are acceptable in new‐born babies and actually cover the middle 80% of the population distribution, that is, the 10th and 90th centiles. Low birthweight babies are usually defined (by the WHO) as weighing less than 2500 g (the 10th centile) regardless of gestational age, and large birth weight babies are defined as weighing above 4000 g (the 90th centile). Hence the normal birth weight range is around 2.5 to 4.0 kg. For our sample data, the 10th to 90th centile range was similar, at 2.75 to 4.03 kg.
4.7 Other Distributions
There are many other probability distributions used in statistics. In this section we briefly list and describe those that are more commonly used.
t‐distribution
Student's t‐distribution is any member of a family of continuous probability distributions that arises when estimating the mean of a Normally distributed variable (in the population) in situations where the sample size is small and the population standard deviation is unknown. It was developed by William Sealy Gosset under the pseudonym Student.
The t‐distribution plays an important role in a number of widely used statistical analyses, including Student's t‐test for assessing the statistical significance of the difference between two sample means, the construction of confidence intervals for the difference between two population means, and in linear regression analysis.
The t‐distribution is symmetric and bell‐shaped, like the Normal distribution, but has heavier tails, meaning that it is more prone than a Standard Normal distribution to producing values that fall far from its mean (Figure 4.14a). The exact shape of the t‐distribution is determined by the mean and variance plus what are known as the degrees of freedom, df. These are derived from the sample size. As the df increases, the shape of the t‐distribution becomes closer to the Normal distribution; and when the sample size (and degrees of freedom) are greater than 30, the t‐distribution is very similar to the Standard Normal distribution.
Figure 4.14 Examples of probability density/distribution functions for the t‐, chi‐squared, F‐ and Uniform distributions. (a) t‐distribution. (b) chi‐squared distribution. (c) F‐distribution. (d) Uniform distribution.
Chi‐squared Distribution
The chi‐squared distribution (or χ2‐distribution) with n degrees of freedom (Figure 4.14b) is the distribution of a sum of the squares of n independent standard Normal random variables. The chi‐squared distribution is always positive and its shape is uniquely determined by the degrees of freedom. The distribution becomes more symmetrical as the degrees of freedom increase and when the degrees of freedom are greater than 50, the chi‐squared distribution is very similar to the Normal distribution. The chi‐squared distribution is used in the common chi‐squared tests for goodness of fit of an observed distribution to a theoretical one, the independence of two criteria of classification of qualitative data, and in confidence interval estimation for a population standard deviation of a Normal distribution from a sample standard deviation.
F‐distribution
The F‐distribution (Figure 4.14c) is the distribution of the ratio of two chi‐squared distributions and is used in hypothesis testing when we want to compare variances, such as in one‐way