Practical Data Analysis with JMP, Third Edition. Robert Carver
Чтение книги онлайн.
Читать онлайн книгу Practical Data Analysis with JMP, Third Edition - Robert Carver страница 16
3. In the NHANES 2016 data table, one nominal variable appears as continuous data within the Columns panel. Find the misclassified variable and correct it by changing its modeling type to nominal.
4. The NHANES 2016 data table was assembled by scientific researchers. Why don’t we consider this data table to be experimental data?
5. Find the data table called Military. We will use this table in later chapters. This table contains rank, gender, and race information about more than a million U.S. military personnel. Use a technique presented in this chapter to create a data table containing a random sample of 500 individuals from this data table.
6. In Chapter 3, we will work with a data table called NIKKEI225. Open this table and examine the columns and metadata contained within the table. Write a short paragraph describing the contents of the table, explaining how the data were collected (experiment, observation, or survey), and define each variable and its modeling type.
7. In later chapters, we will work with a data table called Earthquakes. Open this table and examine the columns and metadata contained within the table. Write a short paragraph describing the contents of the table, explain how the data were collected (experiment, observation, or survey), and define each variable and its modeling type.
8. In later chapters, we will work with a data table called Tobacco Use. Open this table and examine the columns and metadata contained within the table. Write a short paragraph describing the contents of the table, explain how the data were collected (experiment, observation, or survey), and define each variable and its modeling type.
9. Open the Dolphins data table, which we will work with in a later chapter. What are the variable(s), observational units, and data types represented in this table?
10. Open the data table TimeUse, which we will analyze more fully in later chapters. Write a few sentences to explain the contents of the columns named marst, empstat, sleeping, and telff.
11. Open the States data table, which contains statistics about the 50 U.S. states and the District of Columbia. Write a short paragraph describing the contents of the table and, in particular, define the columns called poverty, fed_spend2010, and homicide.
Endnotes
1. In Chapter 21, we will learn how to design experiments. In this chapter, we will concentrate on the nature of experimental data.
2. Visit http://www.cdc.gov/nchs/surveys.htm to find the NHANES and other public-use survey data. Though the topic is beyond the scope of this book, readers engaged in survey research will want to learn how to conduct a database query and import the results into JMP. Interested readers should consult the section on “Importing Data” in Chapter 2 of the JMP User Guide.
3. Visit https://climatecommunication.yale.edu/publications/politics-global-warming-april-2019/ to read the full report.
Chapter 3: Describing a Single Variable
Variable Types and Their Distributions
Distribution of a Categorical Variable
Using Graph Builder to Explore Categorical Data Visually
Distribution of a Quantitative Variable
Using the Distribution Platform for Continuous Data
Exploring Further with the Graph Builder
Summary Statistics for a Single Variable
Overview
Once we have framed some research questions and gathered relevant data, the next phase of an investigation is to examine the variability in the data. The goal of descriptive analysis is to summarize where things stand with each variable. In fact, the term statistics comes from the practice of characterizing the state of political affairs through the reporting of facts and figures. This chapter presents several standard tools that we can use to examine how a variable varies, to describe the pattern of variation that it exhibits, and to look for departures from the overall pattern as well.
The Concept of a Distribution
Data analysis generally focuses on one or more variables—attributes of the individual observations. When we speak of a variable’s distribution, we are referring to a pattern of values. The distribution describes the different values the variable can assume, and how often it assumes each value.
In our first example, we will continue to consider the variability of life expectancy around the world. The data that we will use come to us from the World Bank. In Chapter 1, we used a small portion of this data set for 2017. Now we will look at more years.
Variable Types and Their Distributions
In Chapter 2, we did our work in a JMP Project. Get in the habit of using a project for each chapter.
1. Select File ► New ► Project.
2. Select File ► Open, select the Life Expectancy data table, and click Open.
Before doing any analysis, make sure that you can answer these questions:
● What population does this data table represent?
● What is the source of the data?
● How many variables are in the table?
● What data type is each variable?
● What does each variable represent?
● How many observations are there?
Take special note of the way this data table has been organized. We have 12 annual observations for each country, spaced at 5-year intervals, and they are stacked one upon the other. Not surprisingly, JMP refers to this arrangement as stacked data.
As in Chapter 1, we will raise some questions about how life expectancy at birth varies in different parts of the world.