Practical Data Analysis with JMP, Third Edition. Robert Carver
Чтение книги онлайн.
Читать онлайн книгу Practical Data Analysis with JMP, Third Edition - Robert Carver страница 27
c. Create a scatterplot for MortUnder5 and MortInfant. Report the equation of the fitted line and the R-square value, and explain what you have found.
d. Is there any noteworthy pattern in the covariation of Provider and MatLeave90+? Explain what techniques you used, and what you found.
2. Scenario: How do prices of used cars differ, if at all, in different areas of the United States? How do the prices of used cars vary according to the mileage of the cars? Our data table Used Cars contains observational data about the listed prices of three popular compact car models in three different metropolitan areas in the U.S. The cities are Phoenix, AZ; Portland, OR; and Raleigh-Durham-Chapel Hill, NC. The car models are the Chrysler PT Cruiser Touring Edition, the Honda Civic EX, and the Toyota Corolla LE. The cars were all two years old at the time.
a. Create a scatterplot of price versus mileage. Report the equation of the fitted line, the R-square value, and the correlation coefficient, and explain what you have found.
b. Use the Graph Builder to see whether the relationship between price and mileage differs across different car models.
c. Describe the distribution of prices across the three cities in this sample.
d. Within this sample, are the different car models equally favored in the three different metropolitan areas? Discuss your analysis and explain what you have found.
3. Scenario: High blood pressure continues to be a leading health problem in the U.S. We have a data table (NHANES 2016) containing survey data from nearly 10,000 people in the U.S. in 2017. For this analysis, we will focus on only the following variables:
- RIAGENDR: Respondent’s gender
- RIDAGEYR: Respondent’s age in years
- RIDRETH1: Respondent’s racial or ethnic background
- BMXWT: Respondent’s weight in kilograms
- BPXPLS: Respondent’s resting pulse rate
- BPXSY1: Respondent’s systolic blood pressure (“top” number in BP)
- BPXD1: Respondent’s diastolic blood pressure (“bottom” number in BP)
a. Create a scatterplot of systolic blood pressure versus age. Within this sample, what tends to happen to blood pressure as people age?
b. Compute and report the correlation between systolic and diastolic blood pressure. What does this correlation tell you?
c. Use either a bubble plot to incorporate gender (Color) and pulse rate (Bubble Size) into the graph. Comment on what you see.
d. Compare the distribution of systolic blood pressure in males and females. Report on what you find.
e. Compare the distribution of systolic blood pressure by racial/ethnic background. Comment on any noteworthy differences that you find.
f. Create a scatterplot of systolic blood pressure and pulse rate. One might suspect that higher pulse rate is associated with higher blood pressure. Does the analysis bear out this suspicion?
4. Scenario: Despite well-documented health risks, tobacco is used widely throughout the world. The Tobacco data table provides information about the several variables for 133 different nations in 2005, including these:
- TobaccoUse: Prevalence of tobacco use (%) among adults 18 and older (both sexes)
- Female: Prevalence of tobacco use among females, 18 and older
- Male: Prevalence of tobacco use among males, 18 and older
- CVMort: Age-standardized mortality rate for cardiovascular diseases (per 100,000 population in 2002)
- CancerMort: Age-standardized mortality rate for cancers (per 100,000 population in 2002)
a. Compare the prevalence of tobacco use across the regions of the world, and comment on what you see.
b. Create a scatterplot of cardiovascular mortality versus prevalence of tobacco use (both sexes). Within this sample, describe the relationship, if any, between these two variables.
c. Create a scatterplot of cancer mortality versus prevalence of tobacco use (both sexes). Within this sample, describe the relationship, if any, between these two variables.
d. Compute and report the correlation between male and female tobacco use. What does this correlation tell you?
e. Create a bubble plot to modify your scatterplot from item c above to augment the display to incorporate region (color) and cardiovascular mortality (bubble size). Comment on what you find in the graph.
5. Scenario: Since 2003, the U.S. Bureau of Labor Statistics has been conducting the biennial American Time Use Survey. Census workers use telephone interviews and a complex sampling design to measure the amount of time people devote to various ordinary activities. We have some of the survey data in the data table called TimeUse. Our data table contains observations from more than 43,191 different respondents in 2003, 2007, and 2017. For these questions, use the Data Filter to select and include just the 2017 responses.
a. Create a crosstabulation of employment status by sex, and report on what you find.
b. Create a crosstabulation of full versus part-time employment status by gender, and report on what you find.
c. Compare the distribution of time spent sleeping across the employment categories. Report on what you find.
d. Now change the data filter to include all rows. Compare the distribution of time spent on personal email in 2003, 2007, and 2017. Comment on your findings.
6. Scenario: The data table Sleeping Animals contains information about the sizes, life spans, and sleeping habits of various mammal species. The last few columns are ordinal variables classifying the animals according to their comparative risks of being victimized by predators and the degree to which they sleep in the open rather than in enclosed spaces.
a. Create a crosstabulation of predation index by exposure index, and report on what you find.
b. Compare the distribution of hours of sleep across values of the danger index. Report on what you find.
c. Create a scatterplot of total sleep time and life span for these animals. What does the graph tell you?
d. Compute the correlation between total sleep time and life span for these animals. What does the correlation tell you?
7. Scenario: Let’s return to the data table FAA Bird Strikes CA. The FAA includes categorical variables pertaining to the number of birds struck, the size of the birds struck, and the general weather conditions.
a. Create a crosstabulation of number of birds struck versus sky conditions (Sky), and report on what you find.
b. Create a crosstabulation of number of birds struck versus the precipitation conditions (Precip), and report on what you find.
c. Investigate the relationship between the number of birds struck and the speed of the aircraft.