(2.3) leads to even simpler models. This supports the notion that from a predictive point of view including a few unnecessary predictors (overfitting) is far less damaging than is omitting necessary predictors (underfitting).
A final way of comparing models is from a directly predictive point of view. Since a rough prediction interval is , a useful model from a predictive point of view is one with small , suggesting choosing a model that has small while still being as simple as possible. That is,
1 Increase the number of predictors until levels off. For these data ( in the output refers to ), this implies choosing or .
Taken together, all of these rules imply that the appropriate set of models to consider are those with two, three, or four predictors. Typically, the strongest model of each size (which will have highest , highest , lowest , lowest , and lowest , so there is no controversy as to which one is strongest) is examined. The output on pages 31–32 provides summaries for the top three models of each size, in case there are reasons to examine a second‐ or third‐best model (if, for example, a predictor in the best model is difficult or expensive to measure), but here we focus on the best model of each size. First, here is output for the best four‐predictor model.
The ‐statistic for number of bedrooms suggests very little evidence that it adds anything useful given the other predictors in the model, so we consider now the best three‐predictor model. This happens to be the best four‐predictor model with the one statistically insignificant predictor omitted, but this does not have to be the case.
Each of the predictors is statistically significant at a level, and this model recovers virtually all of the available fit (, while that using all six predictors is ), so this seems to be a reasonable model choice. The estimated slope coefficients are very similar to those from the model using all predictors (which is not surprising given the low collinearity in the data), so the interpretations of the estimated coefficients on page 17 still hold to a large extent. A plot of the residuals versus the fitted values and a normal plot of the residuals (Figure 2.2) look fine, and similar to those for the model using all six predictors in Figure 1.5; plots of the residuals versus each of the predictors in the model are similar to those in Figure 1.6, so they are not repeated here.
Once a “best” model is chosen, it is tempting to use the usual inference tools (such as ‐tests and ‐tests) to try to explain the process being studied. Unfortunately, doing this while ignoring the model selection process can lead to problems. Since the model was chosen to be best (in some sense) it will tend to appear stronger than would be expected just by random chance. Conducting inference based on the chosen model as if it was the only one examined ignores an additional source of variability, that of actually choosing the model (model selection based on a different sample from the same population could very well lead to a different chosen “best” model). This is termed model selection uncertainty. As a result of ignoring model selection uncertainty, confidence intervals can have lower coverage than the nominal value, hypothesis tests can reject the null too often, and prediction intervals can be too narrow for their nominal coverage.
FIGURE 2.2: Residual plots for the home price data using the best three‐predictor model. (a) Plot of residuals versus fitted values. (b) Normal plot of the residuals.
Identifying and correcting for this uncertainty is a difficult problem, and an active area of research, and will be discussed further in Chapter 14. There are, however, a few things practitioners can do. First, it is not appropriate to emphasize too strongly the single “best” model; any model that has similar criteria values (such as or ) to those of the best model should be recognized as being one that could easily have been chosen as best based on a different sample from the same population, and any implications of such a model should be viewed as being as valid as those from the best model. Further, one should expect that ‐values for the predictors included in a chosen model are potentially smaller than they should be, so taking a conservative attitude regarding statistical significance is appropriate. Thus, for the chosen three‐predictor model summarized on page 35, number of bathrooms and living area are likely to correspond to real effects, but the reality of the year built effect is more questionable.
There is a straightforward way to get a sense of the predictive power of a chosen model if enough data are available. This can be evaluated by holding out some data from the analysis (a holdout or validation sample), applying the selected model from the original data to the holdout sample (based on the previously estimated parameters, not estimates based on the new data), and then examining the predictive performance of the model. If, for example, the standard deviation of the errors from this prediction is not very different from the standard error of the estimate in the original regression, chances are making inferences based on the chosen model will not be misleading.