Statistics and Probability with Applications for Engineers and Scientists Using MINITAB, R and JMP. Bhisham C. Gupta
Чтение книги онлайн.
Читать онлайн книгу Statistics and Probability with Applications for Engineers and Scientists Using MINITAB, R and JMP - Bhisham C. Gupta страница 25
As another example, many psychiatric studies involve observational data, and such data do not provide the cause of patient's psychiatric problems. An advantage of observational studies is that they are usually more cost‐effective than experimental studies. The disadvantage of observational studies is that the data may not be as informative as experimental data.
1.4 A Set of Historical Data
Historical data are not collected by the experimenter. The data are made available to him/her.
Many fields of study such as the many branches of business studies, use historical data. A financial advisor for planning purposes uses sets of historical data. Many investment services provide financial data on a company‐by‐company basis.
1.5 A Brief Description of What is Covered in this Book
Data collection is very important since it can greatly influence the final outcome of subsequent data analyses. After collection of the data, it is important to organize, summarize, present the preliminary outcomes, and interpret them. Various types of tables and graphs that summarize the data are presented in Chapter 2. Also in that chapter, we give some methods used to determine certain quantities, called statistics, which are used to summarize some of the key properties of the data.
The basic principles of probability are necessary to study various probability distributions. We present the basic principles of elementary probability theory in Chapter 3. Probability distributions are fundamental in the development of the various techniques of statistical inference. The concept of random variables is also discussed in Chapter 3.
Chapters 4 and 5 are devoted to some of the important discrete distributions, continuous distributions, and their moment‐generating functions. In addition, we study in Chapter 5 some special distributions that are used in reliability theory.
In Chapter 6, we study joint distributions of two or more discrete and continuous random variables and their moment‐generating functions. Included in Chapter 6 is the study of the bivariate normal distribution.
Chapter 7 is devoted to the probability distributions of some sample statistics, such as the sample mean, sample proportions, and sample variance. In this chapter, we also study a fundamental result of probability theory, known as the Central Limit Theorem. This theorem can be used to approximate the probability distribution of the sample mean when the sample size is large. In this chapter, we also study some sampling distributions of some sample statistics for the special case in which the population distribution is the so‐called normal distribution. In addition, we present probability distributions of various “order statistics,” such as the largest element in a sample, smallest element in a sample, and sample median.
Chapter 8 discusses the use of sample data for estimating the unknown population parameters of interest, such as the population mean, population variance, and population proportion. Chapter 8 also discusses the methods of estimating the difference of two population means, the difference of two population proportions, and the ratio of two population variances and standard deviations. Two types of estimators are included, namely point estimators and interval estimators (confidence intervals).
Chapter 9 deals with the important topic of statistical tests of hypotheses and discusses test procedures when concerned with the population means, population variance, and population proportion for one and two populations. Methods of testing hypotheses using the confidence intervals studied in Chapter 8 are also presented.
Chapter 10 gives an introduction to the theory of reliability. Methods of estimation and hypothesis testing using the exponential and Weibull distributions are presented.
In Chapter 11, we introduce the topic of data mining. It includes concepts of big data and starting steps in data mining. Classification, machine learning, and inference versus prediction are also discussed.
In Chapter 12, we introduce topic of cluster analysis. Clustering concepts and similarity measures are introduced. The hierarchical and nonhierarchical clustering techniques and model‐based clustering methods are discussed in detail.
Chapter 13 is concerned with the chi‐square goodness‐of‐fit test, which is used to test whether a set of sample data support the hypothesis that the sampled population follows some specified probability model. In addition, we apply the chi‐square goodness‐of‐fit test for testing hypotheses of independence and homogeneity. These tests involve methods of comparing observed frequencies with those that are expected if a certain hypothesis is true.
Chapter 14 gives a brief look at tests known as “nonparametric tests,” which are used when the assumption about the underlying distribution having some specified parametric form cannot be made.
Chapter 15 introduces an important topic of applied statistics: simple linear regression analysis. Linear regression analysis is frequently used by engineers, social scientists, health researchers, and biological scientists. This statistical technique explores the relation between two variables so that one variable can be predicted from the other. In this chapter, we discuss the least squares method for estimating the simple linear regression model, called the fitting of this regression model. Also, we discuss how to perform a residual analysis, which is used to check the adequacy of the regression model, and study certain transformations that are used when the model is not adequate.
Chapter 16 extends the results of Chapter 15 to multiple linear regressions. Similar to the simple linear regression model, multiple linear regression analysis is widely used. It provides statistical techniques that explore the relations among more than two variables, so that one variable can be predicted from the use of the other variables. In this chapter, we give a discussion of multiple linear regression, including the matrix approach.