Practical Data Analysis with JMP, Third Edition. Robert Carver

Чтение книги онлайн.

Читать онлайн книгу Practical Data Analysis with JMP, Third Edition - Robert Carver страница 17

Автор:
Жанр:
Серия:
Издательство:
Practical Data Analysis with JMP, Third Edition - Robert Carver

Скачать книгу

for us get a general sense of the variation simply by scanning the table visually. We need some sensible ways to find the patterns among the large number of rows. We will begin our analysis by looking at the nominal variable called Region.

      Statisticians generally distinguish among four types of data:

Categorical TypesQuantitative Types
NominalInterval
OrdinalRatio

      One reason that it is important to understand the differences among data types is that we analyze them in different ways. In JMP, we differentiate between nominal, ordinal, and continuous data. Nominal and ordinal variables are categorical, distinguishing one observation from another in some qualitative, non-measurable way. Interval and ratio data are both numeric. Interval variables are artificially constructed, like a temperature scale or stock index, with arbitrarily chosen zero points. Most measurement data are considered ratio data because ratios of values are meaningful. For example, a film that lasts 120 minutes is twice as long as one lasting 60 minutes. In contrast, 120 degrees Celsius is not twice as hot as 60 degrees Celsius.

      In its reporting, the World Bank identifies each country of the world with a continental region. There are seven regions, each with a different number of countries. The variable Region is nominal—it literally names a country’s general location on earth. Let’s get familiar with the different regions and see how many countries are in each. In other words, let’s look at the distribution of Region.

      1. Select Analyze ► Distribution. In the Distribution dialog box (Figure ‎3.1), select the variable region as the Y, Columns variable. Click OK.

      Figure 3.1: Distribution Dialog Box

Figure 1.1 Some JMP Help Options

      Anytime you want to assign a column to a role in a JMP dialog box, you have three options: you can highlight the column name in the Select Columns list and click the corresponding role button, you can double-click the column name, or you can click-drag the column name into the role box.

      The result appears in Figure 3.2. JMP constructs a simple bar chart listing the six continental regions and showing a rectangular bar corresponding to the number of times the name of the region occurs in the data table. Though we cannot immediately tell from the graph alone exactly how many countries are in each, North America clearly has the fewest countries and Europe and Central Asia has the most.

      Figure 3.2: Distribution of Region

Figure 1.1 Some JMP Help Options

      Below the graph is a frequency distribution (titled Frequencies), which provides a more specific summary. Here we find the name of each region, and the number of times each regional name appears in our table. For example, “East Asia & Pacific” occurs 432 times. As a proportion of the whole table, 16.7% of the rows (Prob. = 0.16744) represent countries in that region.

      At this point, you might wisely pause and say, “Wait a second. Can there possibly be 432 countries in the East Asia and the Pacific region?” And you would be right. Remember that we have stacked data, with 13 rows representing 12 years of data devoted to each country. Therefore, there are 432/12 = 36 countries in the region.

      Even though JMP handles the heavy computational or graphical tasks, always think about the data and its context and ask yourself if the results make sense to you.

      Using the Data Filter to Temporarily Narrow the Focus

      Because we know each country appears repeatedly in this data table, let’s choose just one year’s data to obtain a clearer picture of regional variation. We can specify rows to display in a graph by using the Data Filter. This is a tool that allows us to select rows that satisfy specific conditions such as only displaying data rows from the year 2010.

      This chapter illustrates the use of the Data Filter to temporarily select rows in a data table for all active analyses. This is known as the global Data Filter. Alternatively, when you click the red triangles in most analysis reports, you will find a Script option with a local Data Filter that applies only to the current report. The local Data Filter is illustrated in later chapters, but curious readers should explore it at any time.

      1. To see the effects of the Data Filter, we will instruct JMP to automatically update the graph and recalculate the frequencies. Click the red arrow next to Distributions and choose Redo ► Automatic Recalc.

      2. Select Rows ► Data Filter. In the list of Columns, select year and click the Add button.

      3. The dialog box takes on a new appearance (Figure ‎3.3). It now displays a list of years contained in the table. Near the top of the dialog box, check Show and Include so that only the rows that we select for 2010 will appear in all graphs and be included in any computations. Other rows will be hidden and excluded.

      Figure 3.3: Choosing 2010 in the Data Filter

Figure 1.1 Some JMP Help Options

      4. Scroll down the list of Year levels and highlight 2015. As noted in the dialog box, this selects 215 rows and temporarily suppresses the others.

      5. Minimize the Data Filter. If you look in the data table of Life Expectancy, you will see that most rows now have two icons (Figure 1.1 Some JMP Help Options) indicating that they are excluded and hidden. The rows from 2015 are highlighted and will remain so until we clear the Data Filter or take another action that selects other rows.

      In Chapter 1, we met the Graph Builder, and we will use it throughout this book. It is most useful when working with multiple variables, but even with a single nominal variable, it provides a quick way to generate multiple views of the same data. Because interactivity is such an important feature of the tool, this section of the chapter provides few step-by-step directions. You should interact with the tool and think about the extent to which different graphing formats and options communicate the information content of the variable called region.

      1. Select Graph ► Graph Builder. The region column identifies groups of countries. Drag it to the X drop zone.

      Within Graph Builder, you can freely reposition a column from one drop zone to another. Hover the cursor over the column name until the cursor changes to the hand shape Figure 1.1 Some JMP Help Options, then click-drag it to any other drop zone. What’s more, there is also an Undo button. You can also use the same variable in more than one drop zone. For example, you might also color the bars by region.

      With Region on the X axis, you will see seven clumps of black points above the seven region names. This is not very informative.

      At the top of Graph Builder is a selector bar of icons (see Figure 3.4) representing different graph types. The graphing options available depend on the type of data we have placed on the graph. Hence, some icons are dimmed, but with Region on the X axis, we can opt for any of the highlighted option.

      Figure 3.4: Graphing Options for a Nominal Column

Figure 1.1 Some JMP Help Options

      2.

Скачать книгу