Applied Regression Modeling. Iain Pardoe

Чтение книги онлайн.

Читать онлайн книгу Applied Regression Modeling - Iain Pardoe страница 19

Applied Regression Modeling - Iain Pardoe

Скачать книгу

rel="nofollow" href="#fb3_img_img_7d4d927a-277e-5ed9-8ab1-d67bce39b471.png" alt="images"/>, results from the following:

equation

      where the 97.5th percentile comes from the t‐distribution with images degrees of freedom. In other words, plugging in our observed sample statistics, images and images, we can write the 95% confidence interval as images. In this expression, images is the margin of error.

      For a lower or higher level of confidence than 95%, the percentile used in the calculation must be changed as appropriate. For example, for a 90% interval (i.e., with 5% in each tail), the 95th percentile would be needed, whereas for a 99% interval (i.e., with 0.5% in each tail), the 99.5th percentile would be needed. These percentiles can be obtained from the table “Univariate Data” in Notation and Formulas (which is an expanded version of the table in Section 1.4.2). Instructions for using the table can be found in Notation and Formulas.

equation

      where images is the sample mean, images is the sample standard deviation, images is the sample size, and the t‐percentile comes from a t‐distribution with images degrees of freedom. In this expression, images is the margin of error.

      The example above becomes

equation

      Computer help #23 in the software information files available from the book website shows how to use statistical software to calculate confidence intervals for the population mean. As further practice, calculate a 90% confidence interval for the population mean for the home prices example (see Problem 1.10)—you should find that it is (images, images).

      Now that we have calculated a confidence interval, what exactly does it tell us? Well, for the home prices example, loosely speaking, we can say that “we are 95% confident that the mean single‐family home sale price in this housing market is between images and images.” This will get you by among friends (as long as none of your friends happen to be expert statisticians). But to provide a more precise interpretation we have to revisit the notion of hypothetical repeated samples. If we were to take a large number of random samples of size 30 from our population of sale prices and calculate a 95% confidence interval for each, then 95% of those confidence intervals would contain the (unknown) population mean. We do not know (nor will we ever know) whether the 95% confidence interval for our particular sample contains the population mean—thus, strictly speaking, we cannot say “the probability that the population mean is in our interval is 0.95.” All we know is that the procedure that we have used to calculate the 95% confidence interval tends to produce intervals that under repeated sampling contain the population mean 95% of the time. Stick with the phrase “95% confident” and avoid using the word “probability” and chances are that no one (not even expert statisticians) will be too offended.

      Interpretation of a confidence interval for a univariate mean:

      Suppose we have calculated a 95% confidence interval for a univariate mean, images, to be (images, images). Then we can say that we are 95% confident that images is between images and images.

      When using a t‐distribution, how do we know how many degrees of freedom to use? One way to think about degrees of freedom is in terms of the information provided by the data we are analyzing. Roughly speaking, each data observation provides one degree of freedom (this is where the images in the degrees of freedom formula comes in), but we lose a degree of freedom for each population parameter that we have to estimate. So, in this chapter, when we are estimating the population mean, the degrees of freedom formula is images. In Chapter 2, when we will be estimating two population parameters (the intercept and the slope of a regression line), the degrees of freedom formula will be images. For the remainder of the book, the general formula for the degrees of freedom in a multiple linear regression model will be images or images, where images is the number of predictor variables in the model. Note that this general formula actually also works for Chapter 2 (where images) and even this chapter (where images, since

Скачать книгу