Practical Data Analysis with JMP, Third Edition. Robert Carver

Чтение книги онлайн.

Читать онлайн книгу Practical Data Analysis with JMP, Third Edition - Robert Carver страница 19

Автор:
Жанр:
Серия:
Издательство:
Practical Data Analysis with JMP, Third Edition - Robert Carver

Скачать книгу

red triangle next to Distributions and choose Redo ► Automatic Recalc. You will see the histogram change and might notice that it now represents more observations—we are looking at all twelve years of data.

      5. Again, click the red triangle next to Distributions and choose Local Data Filter; choose year and click Add.

      6. Rather than choosing one year, click the red triangle next to Local Data Filter and choose Animation, as shown in Figure 3.6. This will step through the twelve years, briefly selecting each one and changing the histogram for each year.

      Figure 3.6: Animating a Local Data Filter

Figure 1.1 Some JMP Help Options

      7. In the Animation Controls, click the blue “play” arrow and watch what happens. Take special notice of how life expectancy has tended to improve from 1960 through 2015.

      8. After a few cycles, pause the animation in the year 1995.

      Look at the box plot above the histogram. There are two dots at the far left end; these represent two nations with extraordinarily brief life expectancies. We refer to such values as outliers.

      9. Hover the cursor over the left-most point in the box plot. You will see a pop-up note that this is Rwanda, with a life expectancy of only 31.977 years in 1995, reflecting the genocide that took place in 1994.

      Often, it’s easier to think about shape, center, dispersion, and outliers by comparing two distributions. For example, Figure 3.7 shows two histograms using the life expectancy data from 1965 and 2015. We might wonder how human life expectancy changed during a 50-year period, and in these two histograms, we can note differences in shape, center, dispersion and unusual observations.

      Figure 3.7: Comparing Two Distributions

Figure 1.1 Some JMP Help Options

      To create the results shown in Figure 3.7, do the following:

      10. Return to the original Life Expectancy data table.

      11. Re-open the Data Filter dialog box (either choose Windows and find the filter or Rows ► Data Filter). Clear the Select check box but leave Show and Include checked.

      12. Hold down the Ctrl key and highlight 1965 and 2015.

      13. From the menu bar, choose Analyze ► Distribution.

      14. Select LifExp as Y, just as you did earlier.

      15. Cast year into the role of By and click OK.

      This creates the two distributions with vertically oriented histograms. When you look at them, notice that the axis of the first one runs from 25 to 75 years, and the axis on the second graph runs from 50 to 85 years.

      To facilitate the comparison, it is helpful to orient the histograms horizontally in a stacked arrangement and to set the axes to a uniform scale, an option that is available in the red triangle menu next to Distributions. This makes it easy to compare their shapes, centers, and spreads at a glance.

      16. In the Distribution report, while pressing the Ctrl key, click the uppermost red triangle and select Uniform Scaling.

      If you click the red triangle without pressing the Ctrl key, the uniform scaling option would apply only to the upper histogram. Pressing the Ctrl key has the effect of applying the choice to all graphs in the window.

      17. Hold down the Ctrl key, click the red triangle once again, and choose Stack.

      The histograms on your screen should now look like Figure 3.7. How does the shape of the 1965 distribution compare to that of the 2015 distribution? What might have caused these changes in the shape of the distribution?

      We see that people tend to live longer now than they did in 1965. The location (or central tendency) of the 2015 distribution is to the right side of the 1965 distribution. Additionally, these two distributions also have quite different spreads (degrees of dispersion). We can see that the values were far more spread out in 1965 than they are in 2015 and that there were no outliers in either year. What does that reveal about life expectancy around the world during the past 50 years?

      Taking Advantage of Linked Graphs and Tables to Explore Data

      When we construct graphs, JMP automatically links all open tables and graphs. If we select rows either in the data table or in a graph, JMP selects and highlights those rows in all open windows.

      1. Within the 2015 life expectancy histogram, place the cursor over the right-most bar and click. While pressing the Shift key, also click the adjacent bar. Now you should have selected the two bars representing life expectancies of over 75 years. How many rows are now selected? Look in the Rows panel of the Data Table window.

      2. Now find the first window with the Distribution of Region (second tab in your project). Notice that some bars are partially highlighted. When you selected the two bars in the histogram, you were indirectly selecting a group of countries. These countries are grouped within the bar chart as shown, revealing the parts of the world where people tend to live longest.

      Customizing Bars and Axes in a Histogram

      When we use the Distribution platform to analyze a continuous variable, JMP determines how to divide the variable axis and how to create “bins” for grouping observations. These automatic choices can affect the appearance of the distribution and there are several ways to customize the appearance of a histogram.

      We can alter the number of bars in the histogram, creating new boundaries between groups of observations and shifting observations from one bar to the next.

      1. Move back to the Distribution report tab. Click anywhere in a blank area of the 2015 histogram to de-select the two right bars

      2. Choose Tools ► Grabber.

      3. Position the hand Figure 1.1 Some JMP Help Options anywhere over the bars in the 2015 histogram beneath the box plot, and click-drag the tool straight up and down. In doing so, you will change the number and width of the bars, sometimes dramatically changing the shape of the graph.

      Think about this: the apparent shape of the distribution depends on the number of bars we create. By default, the software chooses an initial number of bars, or bins, to categorize the continuous variable. However, that initial choice should not be the final word. As we adjust the number of bins, we should watch closely to see how the shape changes, looking for a rendering that accurately and honestly displays the overall pattern of variation.

      One way to resolve the issue is by using a shadowgram. A shadowgram visually averages a large number of bin widths into a diffuse image with no distinct bars at all. Here is how:

      4. Click the red triangle next to LifeExp in the 2015 histogram.

      5. Choose Histogram Options ► Shadowgram. Figure 3.8 shows the result.

      Figure 3.8: A Shadowgram for a Continuous Variable

Figure 1.1 Some JMP Help Options

      You should notice that there are several Histogram

Скачать книгу