Practical Data Analysis with JMP, Third Edition. Robert Carver
Чтение книги онлайн.
Читать онлайн книгу Practical Data Analysis with JMP, Third Edition - Robert Carver страница 8
Figure 1.1: The JMP Opening Screen
JMP provides an extensive set of tutorials for users that illustrate many of the features of the software. Readers are encouraged to investigate the tutorials on their own. Find the full list of tutorials in the Help menu.
You will also see the JMP Starter window, which is an annotated menu of major functions. It is worth your time to explore the JMP Starter window by navigating through its various choices to get a feel for the wide scope of capabilities that the software offers. As a new user, though, you might find the range of choices to be overwhelming.
In this book, we will tend to close the JMP Starter window and use the menu bar at the top of the screen to make selections. Finally, look at the JMP Home Window. The home window is divided into two panes that can help you keep track of recently used files and currently open windows. You can customize this view, but this book shows the standard two-pane layout.
A Simple Data Table
In this book, we will most often work with data that has already been entered and stored in a file, much like you would type and store a paper in a word-processing file or data in a spreadsheet file. In Chapter 2, you will see how to create a data table on your own.
We will start with the U.N. life expectancy data mentioned earlier. Within the Home Window, do this:
1. Click File ► Open.
2. Navigate your way to the folder of data tables that accompany this book.2
3. Select the file called Life Expectancy 2017 and click Open.
The data table appears in Figure 1.2. Notice that there are four regions in this window including three vertically arranged panels on the left, and the data grid on the right.
Figure 1.2: The Life Expectancy 2017 Data Table
The three panels provide metadata (descriptive information about the data in the table), which is created at the time the data table was saved and can be altered. At this early stage, it is helpful to understand the purpose of each panel.
Beginning at the top left, we find the Table panel, which displays the name of the data table file as well as optional information provided by the creator of the table. You will see a small red triangle pointing downward next to the table name.
Red triangles indicate a context-sensitive menu, and they are an important element in JMP. We will discuss them more in later chapters, but you should expect to make frequent use of these little red triangles.
Just below the red triangle, there is a note describing the data and identifying its source. You can open that note (called a Table variable) just by double-clicking on the word “Credit,” the first line within the Table panel. Figure 1.3 shows the note for this table. A table variable contains metadata about the entire table.
Figure 1.3: Table Variable Dialog Box
The second and third lines of the Table panel include a green arrow. Green arrows indicate that there is a script embedded in the data table. In this case, the script lists the steps to extract this set of data from a much larger data table called WDI, and one can reproduce the subsetting process by running the script. We will use the full WDI data table in future chapters.
Below the Table panel is the Columns panel, shown in Figure 1.4, which lists the column names, JMP modeling types, and other information about the columns.
Figure 1.4: The Columns Panel
There are several important things to notice in the Columns panel. The notation (5/0) at the top of the panel tells us that there are five columns in this data table, and that none of them are selected at the moment. In a JMP data table, we can select one or more columns or rows for special treatment, such as using the label property in the second, third, and fourth columns so that country names, regions, and the year will be displayed within graphs. There is much more to learn about the idea of selection and column properties, and we will return to it later in this chapter.
The panel lists the columns by name. To the left of the names are icons indicating the modeling type. In this example, the first three red icons (these look like bar charts) identify Country Code, Country Name, and Region as nominal data. The “price tag” icons indicate that these variables can act as labels to specifically identify observations that are displayed in a graph.
The green ascending bar icon next to Year indicates that year is to be analyzed as an ordinal variable. In this data table, all observations are from the same year, 2017, but in the original data set, we have annual observations from 1990 through 2018. Hence, this is an ordinal variable.
Finally, the blue triangle next to life_exp identifies the column as continuous data. Remember, it makes sense to perform calculations with continuous data.
At the bottom left, we find the Rows panel (Figure 1.5), which provides basic information about the number of rows (in this case 215, for 215 countries). Like the other two panels, this one provides quick reference information about the number of rows and their states.
Figure 1.5: Rows Panel
The top entry indicates that there are 215 observations in this data table. The next four entries refer to four basic row states in a JMP data table. Initially, all rows share the same state, in that none have been selected, excluded, hidden, or labeled. Row states enable us to control whether individual observations appear in graphs, are incorporated into calculations, or whether they are highlighted in various ways.
The Data Grid area of the data table is where the data reside. It looks like a familiar spreadsheet format, and it just contains the raw data for any analysis. Generally speaking, each column of a table contains either a raw data value (for example, a number, date, or text) or the entire column contains a formula or the result of a computation. Unlike a spreadsheet, each cell in a JMP data table column must be consistent in this sense. You will not find some rows of a column representing one type of data and other rows representing a different type.