Applied Regression Modeling. Iain Pardoe
Чтение книги онлайн.
Читать онлайн книгу Applied Regression Modeling - Iain Pardoe страница 19
where the 97.5th percentile comes from the t‐distribution with
For a lower or higher level of confidence than 95%, the percentile used in the calculation must be changed as appropriate. For example, for a 90% interval (i.e., with 5% in each tail), the 95th percentile would be needed, whereas for a 99% interval (i.e., with 0.5% in each tail), the 99.5th percentile would be needed. These percentiles can be obtained from the table “Univariate Data” in Notation and Formulas (which is an expanded version of the table in Section 1.4.2). Instructions for using the table can be found in Notation and Formulas.
Thus, in general, we can write a confidence interval for a univariate mean,
where
The example above becomes
Computer help #23 in the software information files available from the book website shows how to use statistical software to calculate confidence intervals for the population mean. As further practice, calculate a 90% confidence interval for the population mean for the home prices example (see Problem 1.10)—you should find that it is (
Now that we have calculated a confidence interval, what exactly does it tell us? Well, for the home prices example, loosely speaking, we can say that “we are 95% confident that the mean single‐family home sale price in this housing market is between
Interpretation of a confidence interval for a univariate mean:
Suppose we have calculated a 95% confidence interval for a univariate mean,
Before moving on to Section 1.6, which describes another way to make statistical inferences about population means—hypothesis testing—let us consider whether we can now forget the normal distribution. The calculations in this section are based on the central limit theorem, which does not require the population to be normal. We have also seen that t‐distributions are more useful than normal distributions for calculating confidence intervals. For large samples, it does not make much difference (note how the percentiles for t‐distributions get closer to the percentiles for the standard normal distribution as the degrees of freedom get larger in Table C.1), but for smaller samples it can make a large difference. So for this type of calculation, we always use a t‐distribution from now on. However, we cannot completely forget about the normal distribution yet; it will come into play again in a different context in later chapters.
When using a t‐distribution, how do we know how many degrees of freedom to use? One way to think about degrees of freedom is in terms of the information provided by the data we are analyzing. Roughly speaking, each data observation provides one degree of freedom (this is where the