Introduction to Linear Regression Analysis. Douglas C. Montgomery
Чтение книги онлайн.
Читать онлайн книгу Introduction to Linear Regression Analysis - Douglas C. Montgomery страница 28
![Introduction to Linear Regression Analysis - Douglas C. Montgomery Introduction to Linear Regression Analysis - Douglas C. Montgomery](/cover_pre887212.jpg)
2.5 PREDICTION OF NEW OBSERVATIONS
An important application of the regression model is prediction of new observations y corresponding to a specified level of the regressor variable x. If x0 is the value of the regressor variable of interest, then
(2.44)
is the point estimate of the new value of the response y0.
Now consider obtaining an interval estimate of this future observation y0. The CI on the mean response at x = x0 [Eq. (2.43)] is inappropriate for this problem because it is an interval estimate on the mean of y (a parameter), not a probability statement about future observations from that distribution. We now develop a prediction interval for the future observation y0.
Note that the random variable
is normally distributed with mean zero and variance
because the future observation y0 is independent of
The prediction interval (2.45) is of minimum width at
Example 2.7 The Rocket Propellant Data
We find a 95% prediction interval on a future value of propellant shear strength in a motor made from a batch of sustainer propellant that is 10 weeks old. Using (2.45), we find that the prediction interval is
which simplifies to
Therefore, a new motor made from a batch of 10-week-old sustainer propellant could reasonably be expected to have a propellant shear strength between 2048.32 and 2464.32 psi.
Figure 2.5 The 95% confidence and prediction intervals for the propellant data.
Figure 2.5 shows the 95% prediction interval calculated from (2.45) for the rocket propellant regression model. Also shown on this graph is the 95% CI on the mean [that is, E(y|x) from Eq. (2.43). This graph nicely illustrates the point that the prediction interval is wider than the corresponding CI.
We may generalize (2.45) somewhat to find a 100(1 − α) percent prediction interval on the mean of m future observations on the response at x = x0. Let
(2.46)
2.6 COEFFICIENT OF DETERMINATION
The quantity
is called the coefficient of determination. Since SST is a measure of the variability in y without considering the effect of the regressor variable x and SSRes is a measure of the variability in y remaining after x has been considered, R2 is often called the proportion of variation explained by the regressor x. Because 0 ≤ SSRes ≤ SST, it follows that 0 ≤ R2 ≤ 1. Values of R2 that are close to 1 imply that most of the variability in y is explained by the regression model. For the regression model for the rocket propellant data in Example 2.1, we have
that is, 90.18% of the variability in strength is accounted for by the regression model.
The statistic R2 should be used with caution, since it is always possible to make R2 large by adding enough terms to the model. For example, if there are no repeat points (more than one y value at the same x value), a polynomial of degree n − 1 will give a “perfect” fit (R2 = 1) to n data points. When there are repeat points, R2 can never be exactly equal to 1 because the model cannot explain the variability related to “pure” error.
Although R2 cannot decrease if we add a regressor variable to the model, this does not necessarily mean