Practical Data Analysis with JMP, Third Edition. Robert Carver

Чтение книги онлайн.

Читать онлайн книгу Practical Data Analysis with JMP, Third Edition - Robert Carver страница 22

Автор:
Жанр:
Серия:
Издательство:
Practical Data Analysis with JMP, Third Edition - Robert Carver

Скачать книгу

Create a summary for each of these variables and explain how tobacco use among men compares to that among women.

      7. Scenario: The States data table contains measures and attributes for the 50 U.S. states and the District of Columbia.

      a. The population of the state as estimated by the United States Census Bureau is in pop2018est. Summarize the data in this column, commenting on the center, shape, and spread of the distribution. Note any outliers.

      b. Construct box plots for owner-occ and poverty. For each plot, explain what the landmarks tell you about the distribution of each variable and comment on noteworthy features of the plot.

      c. The column mean_Income is the mean household income, and med_income_17 is the median household income in the state. Use an appropriate technique from this chapter to summarize the data in these two columns and comment on what you see. Why do you think mean incomes are consistently greater than median incomes?

      d. The column homicide is the rate of homicide deaths per 100,000 persons in the state. Summarize the responses and comment.

      e. The column soc_sec is the number of people receiving Social Security benefits within the state. Use an appropriate technique to summarize the distribution of this variable. Identify the outlying states and suggest a reason for the fact that these states are outliers.

      f. Compare the distributions of unemp2010 and unemp2017 and comment on what you find.

      Endnotes

      Chapter 4: Describing Two Variables at a Time

       Overview

       Two-by-Two: Bivariate Data

       Describing Covariation: Two Categorical Variables

       A Digression: Recoding a Variable and Changing Value Order

       Describing Covariation: One Continuous, One Categorical Variable

       Describing Covariation: Two Continuous Variables

       Scatter Plots for Very Large Data Tables

       More Informative Scatter Plots

       Application

      Some of the most interesting questions in statistical inquiries involve covariation: how does one variable change when another variable changes? After working through the examples in this chapter, you will know some basic approaches to bivariate analysis, that is, the analysis of two variables at a time.

      Chapter 3 covered techniques for summarizing the variation of a single variable: Univariate distributions. In many statistical investigations, we are interested in how two variables vary together and, in particular, how one variable varies in response to the other. For example, nutritionists might ask how consumption of carbohydrates affects weight loss or marketers might ask whether a demographic group responds positively to an advertising strategy. In these cases, it’s not sufficient to look at one univariate distribution or even to look at the variation in each of two key variables separately. We need methods to describe the covariation of bivariate data, which is to say we need methods to summarize the ways in which two variables vary together.

      The organization of this chapter is simple. We have been classifying data as categorical or continuous. If we focus on two variables in a study and conceive of one variable as a response to the other factor, there are four possible combinations to consider, shown in Table 4.1. The next three sections discuss three of the four possibilities: We might have two categorical variables, two continuous variables, a continuous response with a categorical factor, or a categorical response to a continuous factor.

      Table 4.1: Chapter Organization—Bivariate Factor-Response Combinations

Continuous FactorCategorical Factor
Continuous ResponseThird section to followSecond section to follow
Categorical ResponseSee Chapter 19Next Section

      In this chapter, we will introduce several common methods for three ways to pair bivariate data. The first examples relate to a serious issue in civil (non-military) air travel: the periodic collisions between wildlife and commercial airplanes. According to the U.S. Federal Aviation Administration (FAA) so-called wildlife-aircraft strikes have cost hundreds of lives in the past century and account for significant financial losses as well in damage to aircraft. These collisions present environmental, public safety, and business issues for many interested parties. The FAA maintains a database to monitor the incidence of wildlife-aircraft strikes. From 2010 through April 2019, the database contains nearly 117,000 reports of strikes in North America. The state reporting the largest number of events was California.

      For this chapter, we will use a subset of the database, looking only at bird strikes associated with three California airports: Los Angeles International, Sacramento (the state capital), and San Francisco International. All of the available data is in the data table called FAA Bird Strikes CA. This data table contains 36 columns providing attributes for each of 3,411 bird strikes at or near the three airports.

      1. Open the FAA Bird Strikes CA data table now and scroll through the columns. Our analysis will use several columns, each of which we will explain as we work through the examples.

      The REMARKS column in this data table has an icon that we have not previously seen: Figure 1.1 Some JMP Help Options This column contains Unstructured Text, which is free-form character data. Specifically, remarks are the written description of the wildlife strike as provided by the person who filed the original report. Although it is well beyond the scope of this book, interested readers should check out the Text Explorer platform in the Analyze menu or visit www.jmp.com for demonstrations and examples of text analysis.

      This data table also has a large amount of missing data. Although the FAA attempts to record a full set of variables for each strike, sometimes the data is not known to the individual reporting the incident. In a JMP data table, missing categorical data appears as a blank cell, and missing continuous data is a dot.

      In a bivariate analysis, we think in terms of a pair of observations for each bird strike. JMP will only analyze those incidents that have a complete pair of observations for whichever two columns that we select.

      At what point in a flight do bird

Скачать книгу