Applied Biostatistics for the Health Sciences. Richard J. Rossi

Чтение книги онлайн.

Читать онлайн книгу Applied Biostatistics for the Health Sciences - Richard J. Rossi страница 34

Applied Biostatistics for the Health Sciences - Richard J. Rossi

Скачать книгу

I.

      When describing the typical values in the population, the more variation there is in a population the harder it is to measure the typical value, and just as there are several ways of measuring the center of a population there are also several ways to measure the variation in a population. The three most commonly used parameters for measuring the spread of a population are the variance, standard deviation, and interquartile range. For a quantitative variable X

       the variance of a population is defined to be the average of the squared deviations from the mean and will be denoted by σ2 or Var(X). The variance of a variable X measured on a population consisting of N units is

       the standard deviation of a population is defined to be the square root of the variance and will be denoted by σ or SD(X).

       the interquartile range of a population is the distance between the 25th and 75th percentiles and will be denoted by IQR.

      Note that each of these measures of spread is a positive number except in the rare case when there is absolutely no variation in the population, in which case they will all be equal to 0. Furthermore, the larger each of these values is the more variability there is in the population. For example, for the two populations in Figure 2.16 the standard deviation of population I is larger than the standard deviation of population II.

      Because the standard deviation is the square root of the variance, both σ and σ2 contain equivalent information about the variation in a population. That is, if the variance is known, then so is the standard deviation and vice versa. For example, if Var(X)=σ2=25, then the standard deviation is σ=25=5, and if SD(X)=σ=20, then Var(X)=σ2=202=400. The standard deviation is generally used for describing the variation in a population because the units of the standard deviation are the same as the units of the variable; the units of the variance are the units of the variable squared. Also, the standard deviation is roughly the size of a typical deviation from the mean of the population. For example, if X is a variable measured in cubic centimeters (cc), then the standard deviation is also measured in cc’s but the variance will be measured in cc2 units.

      Figure 2.17 IQR is the distance between X75 and X25.

      Like the median, the interquartile range is unaffected by the extremes in a population. On the other hand, the standard deviation and variance are heavily influenced by the extremes in a population. The shape of the distribution influences the parameters of a distribution and dictates which parameters provide meaningful descriptions of the characteristics of a population. However, for a mound-shaped distribution, the standard deviation and interquartile range are closely related with σ≈0.75⋅ IQR.

      Consider the two populations listed below that were used in Example 2.14.

StartLayout 1st Row 1st Column Blank 2nd Column Population 1 colon 22 comma 24 comma 25 comma 27 comma 28 comma 28 comma 31 comma 32 comma 33 comma 35 comma 39 comma 41 comma 67 2nd Row 1st Column Blank 2nd Column Population 2 colon 22 comma 24 comma 25 comma 27 comma 28 comma 28 comma 31 comma 32 comma 33 comma 35 comma 39 comma 41 comma 670 EndLayout

      Again, these two populations are identical except for their largest values, 67 and 670. In Example 2.17, the mean values of populations 1 and 2 were found to be μ1=33.23 and μ2=79.63. The variances of these two populations are σ12=134.7 and σ22=31498.4, and the standard deviations are σ1=134.7=11.6 and σ2=31498.4=177.5. By changing the maximum value in the population from 67 to 670, the standard deviation increased by a factor of 15. In both populations, the 25th and 75th percentiles are 26 and 37, respectively, and thus, the interquartile range for both populations is IQR =37−26=11.

      Figure 2.18 The one-standard deviation empirical rule; roughly 68% of a mound-shaped distribution lies between the values μ−σ and μ+σ.

      Figure 2.19 The two-standard deviation empirical rule; roughly 95% of a mound-shaped distribution lies between the values μ−2σ and μ+2σ.

      Figure 2.20 The three-standard deviation empirical rule; roughly 99% of a mound-shaped distribution lies between the values μ−3σ and μ+3σ.

       THE EMPIRICAL RULES

      For populations having mound-shaped distributions,

      1 Roughly 68% of all of the population values fall within 1 standard deviation of the mean. That is, roughly 68% of the population values fall between the values μ−σ and μ+σ.

      2 Roughly 95% of all the population values fall within 2 standard deviations of the mean. That is, roughly 95% of the population values fall between the values μ−2σ and μ+2σ.

      3 Roughly 99% of all the population values fall within 3 standard deviations of the mean. That is, roughly 99% of the population values fall between the values μ−3σ and μ+3σ.

      The standard deviations of two populations resulting from measuring the same variable can be compared to determine which of the two populations is more variable. That is, when one standard deviation is substantially larger than the other (i.e., more than two times as large), then clearly the population with the larger standard deviation is much more variable than the other. It is also important to be able to determine whether a single population is highly variable or not. A parameter that measures the relative variability in a population is the coefficient of variation. The coefficient of variation will be denoted by CV and is defined to be

CV equals StartFraction sigma Over Math bar pipe bar symblom mu Math bar pipe bar symblom EndFraction

Скачать книгу