Statistics. David W. Scott

Чтение книги онлайн.

Читать онлайн книгу Statistics - David W. Scott страница 6

Statistics - David W. Scott

Скачать книгу

      My aim in writing this book is to provide a self‐contained, one‐semester probability and statistics introduction that covers core material without ballooning into a huge tome. Since statistics requires an understanding of distributions and relationships (for example, predicting

from
), some introductory knowledge of multivariate calculus and linear algebra will be assumed. Examples will use the
language, but they can easily be modified to other systems such as Matlab. Mathematica will be used for symbolic computations. JMP can be used to perform statistical tests in a unified manner.

      The course divides naturally into three sections: (1) classical probability; (2) distribution functions, density functions, and random variables; and (3) statistical inference and hypothesis testing.

      In selecting material to include, I have favored models that follow directly from simple, intuitive assumptions. I have also favored statistical topics that are widely used. In this era of data science, I have occasionally selected new topics that are relevant and easily understood. For example, robustness is relevant because bad data or outliers can adversely affect classical methodology.

      Students who have taken AP Statistics will have an advantage in that they will have seen a large number of cookbook statistical procedures and tests. We will cover only a selection, as the mathematical foundations (or outline thereof) will be of equal interest here. Often we will sacrifice mathematical rigor in favor of an engineering‐level understanding without apology. Motivated students will naturally follow this course with more mathematically rigorous courses in statistics, probability, and stochastic processes. Reading about other statistical tests and methods should be straightforward after mastering the material covered here.

      I have included a handful of problems and case studies, to keep things simple. There will be a live course website with numerous sample problems and exams. Instructors with special interests can easily insert their own examples and problems in appropriate sections.

      The URL for the additional course material is

      http://www.stat.rice.edu/∼scottdw/wiley-dws-2020/

      The directory contains problems, sample exams, and the pdf file all-figs.pdf, which displays all 57 figures, including 45 color diagrams. The author may be reached at [email protected]

       David W. Scott

       Houston, Texas

       September, 2019

      The field of statistics has a rich history that has become tightly integrated into the emerging field of data sciences. Collaboration with computer scientists, numerical analysts, and decision makers characterizes the field. The role of statistics and statisticians is to find actionable information in a noisy collection of data. Every field of academic endeavor encounters this problem: from the electrical engineer trying to find a signal in a noisy channel to an English professor trying to determine the authorship of a contested newly discovered manuscript.

      There are two basic tasks for the statistician. First is to characterize the distribution of possible outcomes using a batch of representative data. An actuary may be asked to find a dollar loss for car accidents that is not exceeded 99.999% of the time. An economist may be asked to provide useful summaries of a collection of income data. The histogram is our primary tool here, an idea that did not appear until the 17th century; see Graunt (1662), who analyzed death records during height of the plague outbreak in Europe.

      The second task is that of prediction. A bank may wish to understand how credit risk is related to other information that may be available. A mechanical engineer may wish to understand the risk inherent in a new design under extreme conditions. Methods for performing this task underlie many algorithms today, for example, translating foreign languages or image recognition.

      The mathematical backbone of all of our statistical methods is probability theory. Thus we study the basics of probability theory and random variables in the first part of this course. Statistical methods and the basics of statistical decision theory form the core of the middle third of this course. Specific tests and data analysis approaches finish our study.

from the two quartiles. Any points outside these whiskers are plotted as potential outliers.

      1.1.1 Pearson's Father–Son Height Data

fathers and an adult son. In the left frame in Figure 1.1, we display a box‐and‐whiskers plot of these data. We see that the sons are taller than their fathers by about an inch. There are also more potential outliers among the sons for some reason.

      In the middle frame of Figure 1.1, we show Tukey's stem‐and‐leaf plot of the 1078 differences of the heights of each son and his father. The range of the data is

and the first seven sorted values rounded to one decimal place are
. Each data point is decomposed into a stem and a leaf digit. Thus
has a stem of
and a leaf of 0. The top line is actually
, although it is too small to see. With so much data, each stem is broken into two lines to provide more detail. Thus the next two lines show a stem of
but no leaves

Скачать книгу