Practical Data Analysis with JMP, Third Edition. Robert Carver

Чтение книги онлайн.

Читать онлайн книгу Practical Data Analysis with JMP, Third Edition - Robert Carver страница 18

Автор:
Жанр:
Серия:
Издательство:
Practical Data Analysis with JMP, Third Edition - Robert Carver

Скачать книгу

using different graphing formats. Which ones do you think do the best job of clearly and fully summarizing the number of countries within each region?

      3. For this example, let’s use a bar chart (seventh option from the left). There is considerable research demonstrating that most people find this simple graph type easy to interpret accurately. Then click Done.

      It is always good practice to help a reader by giving a graph an informative descriptive title. The default title “region,” though accurate, is not very helpful. In JMP, it is easy to alter the titles of graphs and other results.

      4. Move your cursor to the title region just above the graph and double-click. You can now customize the title of this chart to make it more informative. Type Observations per Region, replacing “Chart” as the title.

      5. We have done a bit of work on our project. Let’s save it now as Chap_03.

      North America

      Latin America & Caribbean

      Europe & Central Asia

      Middle East & North Africa

      Sub-Saharan Africa

      South Asia

      East Asia & Pacific

      To change the default sequence of categorical values (whether nominal or ordinal), we return to the Life Expectancy data table.

      6. Select region from the data grid or the columns panel, right-click, and select Column Info.

      7. Click Column Properties and select Value Order.

      8. Select a value name and use the Move Up and Move Down buttons to revise the value order to match what we have chosen. Then, click OK.

      Now return to Graph Builder and look at the bar chart. You will see that customizing the value order within the data table re-orders the X axis. The effect should speak for itself.

      9. Experiment with the other charting options by clicking the red arrow and choosing Show Control Panel and then selecting various graph types.

      With categorical data, your choices are limited. Still, it’s worth a few minutes to become familiar with them. When you are through exploring, restore the graphic to a bar chart and leave it open. We will return to this graph in a few pages.

      The standard graphing choices expand considerably when we have quantitative data—particularly for continuous variables or discrete variables with many possible values. We will want to summarize a large collection of values in a way that shows where observations tend to cluster.

      As a way of visualizing the distribution of a continuous variable, the most commonly used graph is a histogram. A histogram is basically a bar chart with values of the variable on one axis and frequency on the other. Let’s illustrate.

      In our data set, we have estimated life expectancy at birth for each country for 13 different years. We just used the Data Filter to isolate the data for 2015, so let’s continue to explore the state of the world in 2015.

      As before, we will first use the Distribution platform to do most of the work here.

      1. Select Analyze ► Distribution. Cast LifeExp into the role of Y, Columns and click OK.

      2. When the distribution window opens, click the red triangle next to Distributions, and select Stack. This will re-orient the output horizontally making it a bit easier to interpret.

      The histogram (Figure ‎3.5) is one representation of the distribution of life expectancy around the world in 2015, and it gives us one view of how much life expectancy varies. Above the histogram is a box plot (also known as a box-and-whiskers plot), which will be explained later in this chapter.

      Figure 3.5: A Typical Histogram

Figure 1.1 Some JMP Help Options

      As in the bar charts that we have studied earlier, there are two dimensions in the graph. Here, the horizontal axis displays values of the variable and the vertical axis displays the frequency of each small interval of values. For example, we can see that only a few countries have projected life expectancies of 51 to 54 years, but many have life expectancies between 74 and 78 years.

      When we look at a histogram, we want to develop the habit of looking for four things: the shape, the center (or central tendency), the dispersion of the distribution, and unusual observations. The histogram can very often clearly represent these three aspects of the distribution.

      Shape: Shape refers to the symmetry of the histogram and to the presence of peaks in the graph. A graph is symmetric if you could find a vertical line in the center defining two sides that are mirror images of one another. In Figure ‎3.5, we see an asymmetrical graph. There are few observations in the tails on the left, and most observations clump together on the right side. We say this is a left-skewed (or negatively skewed) distribution.

      Many distributions have one or more peaks—data values that occur more often than the other values. Here we have a distinct peak around 75 to 76 years, and others closer to 72 and 83. Some distributions have multiple peaks, and some have no distinctive peaks at all. In short, we might describe the shape of this distribution as “multi-peaked and left-skewed.”

      Center (or central tendency): Where do the values congregate on the number line? In other words, what values does this variable typically assume? As you might already know, there are several definitions of center as reflected in the mean, median, and mode statistics. Visually, we might think of the center of a histogram as the halfway point of the horizontal axis (the median, which is approximately 74 years in this case), as the highest-frequency region (the highest peak near 75), perhaps as a type of visual balancing point (the mean, which is approximately 72), or in some other way. Any of these interpretations have legitimacy, and all respond to the question in slightly different ways.

      Dispersion (or spread): While the concept of center focuses on the typical, the concept of spread focuses on departures from the typical. The question here is, “how much does this variable vary?” and again there are several reasonable ways to respond. We might think in terms of the lowest and highest observed values (from about 40 to 85), in terms of a vicinity of the center (for example, “life expectancy tends to vary in most countries between about 65 and 85”), or in some other relative sense.

      Unusual Observations: We can summarize the variability of a distribution by citing its shape, center, and dispersion, but in some distributions, there may be a small number of observations that deviate substantially from the pattern. In 2015, there was no such grouping, but let’s explore the shifts in the distribution over time and also find some unusual observations.

      3. Re-open the global data filter (Rows ► Data Filter). Click Clear.

      4.

Скачать книгу