Statistics. David W. Scott
Чтение книги онлайн.
Читать онлайн книгу Statistics - David W. Scott страница 7
In the right frame of Figure 1.1, we show the frequency counts in a histogram. The histogram uses a parameter
called the bin width to construct an equally spaced mesh . Then we count the number of points in each interval. These counts are displayed as a bar chart. (The histogram can use any anchor point, although 0 is a common choice.) For the histogram shown, the anchor point selected was 0, and was chosen using Scott's rule ; see Scott (1979). This rule is discussed in Section 9.1.4.1. The default choice in functionhist
is Sturges' rule, discussed in Section 9.1.4.3, which chooses 11 bins with (not shown).
The choice of
is often considered a matter of convenience. The stem‐and‐leaf plot using one‐digit integer stems limits its choices. By way of contrast, any positive real number can be used in a histogram. In Figure 1.2, we show the histograms using by Scott's rule, as well as and . Loosely speaking, the histograms using are missing useful information, while the histograms using display spurious detail. We discuss strategies for finding the best choice of in Section 9.1. In any case, the histogram is a powerful tool for understanding the full distribution of data.Figure 1.1 Displays of the father–son height data collected by Karl Pearson: (left) box‐and‐whiskers plot; (middle) stem‐and leaf plot; (right) histogram.
Figure 1.2 Histograms of the sons' heights (top row) and fathers' heights (bottom row) using three bin widths:
, , from left to right; see text.
1.1.2 Lord Rayleigh's Data
In Exploratory Data Analysis, Tukey (1977) demonstrates the box‐and‐whiskers plot using the Lord Rayleigh data, which measure the weight of nitrogen gas obtained by various means; see Table 1.1. Discrepancies in the results led to his discovery of the element argon. Rayleigh made
measurements from 1892 to 1894, with a mean of 2.30584 and a standard deviation of 0.00537. It is common to assume such measurements of a fundamental quantity are normally distributed. Multiple experiments are run and the results averaged in the presumption that a more accurate estimate will result.Table 1.1 Lord Rayleigh's 24 measurements (sorted) of the weight of a sample of nitrogen. The first 10 came from chemical samples, while the last 14 came from pure air.
2.29816 | 2.29849 | 2.29869 | 2.29889 | 2.29890 |
2.29940 | 2.30054 | 2.30074 | 2.30143 | 2.30182 |
2.30956 | 2.30986 | 2.31001 | 2.31010 | 2.31010 |
2.31012 | 2.31017 | 2.31024 | 2.31024 | 2.31026 |
2.31027 | 2.31028 | 2.31035 | 2.31163 |
Figure 1.3 Displays of Lord Rayleigh's 24 measurements of the atomic weight of nitrogen gas. (Left) Histogram with four bins; (middle) a second histogram; (right) stem‐and‐leaf display using the
command .In the left frame of Figure 1.3, we display a histogram with four (carefully selected) bins. The histogram is shown on a density scale, rather than a frequency scale, so that the area of the shaded region is 1. We shall see in Problem 1 that this is accomplished by