Applied Biostatistics for the Health Sciences. Richard J. Rossi
Чтение книги онлайн.
Читать онлайн книгу Applied Biostatistics for the Health Sciences - Richard J. Rossi страница 31
![Applied Biostatistics for the Health Sciences - Richard J. Rossi Applied Biostatistics for the Health Sciences - Richard J. Rossi](/cover_pre1136357.jpg)
Figure 2.7 An example of a distribution with a long tail to the left.
Figure 2.8 An example of a bimodal distribution.
The value of the population under the peak of a probability density graph is called a mode. A distribution can have more than one mode, and a distribution with more than one mode is called a multimodal distribution. When a distribution has two or more modes, this usually indicates that there are distinct subpopulations clustering around each mode. In this case, it is often more informative to have separate graphs of the probability distributions for analyzing each of the subpopulations.
Example 2.11
In studying obsessive compulsive disorder (OCD), the age at onset is an important variable that is believed to be related to the neurobiological features of OCD; OCD is classified as being either Child Onset OCD or Adult Onset OCD. In the article “Is age at symptom onset associated with severity of memory impairment in adults with obsessive-compulsive disorder?” published in the American Journal of Psychiatry (Henin et al., 2001), the authors reported the distribution of the age for onset of OCD given in Figure 2.9. Because there are two modes (peaks) in Figure 2.9, the distribution is suggesting that there might be two different distributions for the age of onset of OCD, one for children and one for adults. Because the clinical diagnoses are Child Onset OCD and Adult Onset OCD, it is more informative to study each of these subpopulations separately. Thus, the distribution of age of onset of OCD has been separated into distributions for the distinct classifications as Child Onset OCD and Adult Onset OCD that are given in Figure 2.10.
Figure 2.9 Distribution of age at which OCD is diagnosed.
Figure 2.10 Distribution of the age at which OCD is diagnosed for Child Onset OCD and Adult Onset OCD.
The shape of the distribution of a discrete variable can also be described as long-tail right, mound shaped, long-tail left, or multimodal. For example, the 2005 National Health Interview Survey (NHIS) reports the distribution of the size of a family, a discrete variable, and the distribution according to the 2005 National Health Interview Survey is given in Figure 2.11. Note that the distribution of family size according to the 2005 NHIS data is a long-tail right discrete distribution.
Figure 2.11 Distribution of family size according to the 2005 NHIS.
2.2.2 Describing a Population with Parameters
Because the distribution of a variable contains all of the information on how the units in the population are distributed, every question concerning the target population can be answered by studying the distribution of the target population. An alternative method of describing a population is to summarize specific characteristics of the population. That is, the target population can be summarized by determining the values of specific parameters such as the parameters that measure the typical value in the population, population percentages, the spread of the population, and the extremes of a population.
2.2.3 Proportions and Percentiles
Populations are often summarized by listing the important percentages or proportions associated with the population. The proportion of units in a population having a particular characteristic is a parameter of the population, and a population proportion will be denoted by p. The population proportion having a particular characteristic, say characteristic A, is defined to be
Note that the percentage of the population having characteristic A is p×100%. Population proportions and percentages are often associated with the categories of a qualitative variable or with the values in the population falling in a specific range of values. For example, the distribution of a qualitative variable is usually displayed in a bar chart with the height of a bar representing either the proportion or percentage of the population having that particular value.
Example 2.12
The distribution of blood type according to the American Red Cross is given in Table 2.4 in terms of proportions.
Table 2.4 The Proportions of Blood Type and Rh Factor
Blood Type | Rh Factor | |
---|---|---|
+ | − | |
O | 0.38 | 0.07 |
A | 0.34 | 0.06 |
B | 0.09 | 0.02 |
AB | 0.03 | 0.01 |
An important proportion in many biomedical studies is the proportion of individuals having a particular disease, which is called the prevalence of the disease. The prevalence of a disease is defined to be
For example, according to the Centers for Disease Control and Prevention (CDC) the prevalence of smoking among adults in the United States in January through June 2005 was 20.9%. Proportions also play important roles in the study of survival and cure rates, the occurrence of side effects of new drugs, the absolute and relative risks associated with a disease, and the efficacy of new treatments and drugs.
A parameter that is related to a population proportion for a quantitative variable is the pth percentile of the population. The pth percentile is the value in the population where p percent of the population falls below this value. The pth percentile will be denoted by xp for values of p between 0 and 100. Note that the percentage of the population values falling below xp is p. For example, if the 10th percentile is 2.2, then 10% of the population values fall below the value 2.2.
Percentiles can be used to describe many different characteristics