Читать онлайн книгу - Statistics. David W. Scott. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Скачать книгу

target="_blank" rel="nofollow" href="#fb3_img_img_f037c698-1690-5726-b9e0-f933e466ccb6.png" alt="images"/>.

The first histogram in Figure 1.3 hides the interesting structure contained in the small dataset. The second histogram and stem‐and‐leaf plot show the two clusters quite clearly. Charting of data before the 1900s was not common, and looking at a table of the data would typically not reveal this feature. It turned out that Lord Rayleigh had combined various sources of the gas with several purifying agents and extraction methods. The samples originating from “pure air” were “contaminated” with argon. For the discovery of argon, Lord Rayleigh was awarded the Nobel Prize in Physics in 1904.

1.1.3 Discussion

Finding structure in data is a primary goal of data science. Graphical methods are powerful approaches to discovering unexpected or hidden structure. Some of these methods are better suited to small datasets. In a multivariate statistics course, we will learn how to analyze data with more than one variable. Modern genetic datasets often result in more than

variables!

1.2 Exploring Prediction Using Data

The second fundamental task of statistics is prediction. Data for this task are typically ordered pairs,

. The goal is to predict the value of the

variable using the corresponding value of the

variable. For example, we might try to predict a son's height (

) knowing the father's height (

). Or a bank contemplating a mortgage loan may use a person's credit score to predict the probability the person will default on the loan.

The initial step is to plot a scatter diagram of the

data points in order to determine if there is a strong relationship between

and

. The relationship, if it exists, is linear or nonlinear. If knowledge of

does not convey any information about the value of

, then the scatter diagram will have no slope or trends, with

values just scattered around their average.

1.2.1 Body and Brain Weights of Land Mammals

In the left frame of Figure 1.4, we plot the brain and body weights of 62 land mammals from the

MASS library. The relationship, if any, is hard to discern since 59 of the measurements are overplotted near the origin. We might choose to exclude the two elephant measurements and even the human data point as outliers, and then replot.

However, Tukey introduced a power transformation ladder to re‐express a variable

see Problem 3 for an explanation of why

is used in place of

when

In the right frame of Figure 1.4, we use the log function to dramatic effect. There clearly is a strong relationship that allows highly accurate prediction of the log(brain weight) of a land mammal knowing its log(body weight). (The body weight is easily measured for a living specimen, but not its brain weight.) Moreover, the relationship appears to be linear. In this re‐expressed scatter diagram, the two or three outliers identified in the first plot are no longer outliers.

1.2.2 Space Shuttle Flight 25

The 25th launch in the Space Shuttle program was scheduled for 22 January 1986, but postponed for various reasons each day until 28 January. The temperature had dropped to 28

overnight, and it was 36

when the launch was attempted at 11:38 a.m. During the first 90 s, several O‐rings on the solid rocket boosters failed, leading to a catastrophic explosion and loss of all seven crew members. Scientists knew previous shuttle flights had occasionally experienced one or two O‐ring failures, but a launch had never been attempted at freezing temperatures. Varying opinions of the safety were provided to the launch director, who eventually decided to proceed. One of the data analyses is reproduced in the first row of Figure 1.5.

Graphs depict the plot of the raw and log-transformed body and brain weights of sixty-two land mammals.

Figure 1.4 Scatter diagrams of the raw and

‐transformed body and brain weights of 62 land mammals.

Graphs depict the analysis of the number of O-ring failures for the first twenty-four Space Shuttle launches.

Figure 1.5 Analysis of the number of O‐ring failures for the first 24 Space Shuttle launches; see text.

In the heading of the scatter diagram in first frame, we see a list of the 7 (of the first 24) shuttle flights that experienced 1 or 2 O‐ring failures. Two failures were observed at the lowest temperature of 53 images , which was well above the temperature range of 28–36 images on the day of the disaster. Strangely, two failures had also been observed at the highest temperature of 75 images .

In the second frame, we have jittered the data by adding a little uniform noise. This reveals that there were two data points superimposed at images ; jittering broke that tie. In the third frame, the data are replotted, but with an expanded images ‐axis

Скачать книгу

Statistics. David W. Scott

Чтение книги онлайн.

Читать онлайн книгу Statistics - David W. Scott страница 8

Информация о книге:

1.1.3 Discussion

1.2 Exploring Prediction Using Data

1.2.1 Body and Brain Weights of Land Mammals

1.2.2 Space Shuttle Flight 25