Handbook of Regression Analysis With Applications in R. Samprit Chatterjee
Чтение книги онлайн.
Читать онлайн книгу Handbook of Regression Analysis With Applications in R - Samprit Chatterjee страница 28
![Handbook of Regression Analysis With Applications in R - Samprit Chatterjee Handbook of Regression Analysis With Applications in R - Samprit Chatterjee](/cover_pre848485.jpg)
Coefficients: Estimate Std.Error t val P(>|t|) VIF (Intercept) 1.157e+00 2.54e+00 0.46 0.651 Bush.pct.2000 1.633e-01 3.67e-02 4.46 3.7e-05 1.55 *** e.Voting 1.272e+01 4.20e+00 3.03 0.004 41.25 ** Bush.2000 X e.Voting -2.297e-01 7.53e-02 -3.05 0.003 38.25 ** Change.turnout -1.223e-05 1.36e-05 -0.90 0.370 2.44 Median.income -1.718e-04 5.57e-05 -3.08 0.003 1.65 ** Hispan.pop.pct -4.892e-02 2.94e-02 -1.66 0.102 1.14 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.233 on 60 degrees of freedom Multiple R-squared: 0.4585, Adjusted R-squared: 0.4044 F-statistic: 8.468 on 6 and 60 DF, p-value: 1.145e-06
Residual plots given in Figure 2.7 do not indicate any obvious problems, although the potential nonconstant variance related to whether a county used electronic voting or not noted in Figure 2.4 is still indicated. We will not address that issue here, but correction of nonconstant variance related to subgroups in the data will be discussed in Section 6.3.3.
FIGURE 2.7: Residual plots for the 2004 election data.
2.5 Summary
In this chapter, we have discussed various issues related to model building and model selection. Such methods are important because both underfitting (not including variables that are needed) and overfitting (including variables that are not needed) lead to problems in interpreting the results of regression analyses and making predictions using fitted regression models. Hypothesis tests provide one tool for model building through formal comparisons of models. If one model is a special case of another, defined through a linear restriction, then a partial
While useful, hypothesis tests do not provide a complete tool for model building. The problem is that a hypothesis test does not necessarily answer the question that is of primary importance to a data analyst. The
These considerations open up a broader spectrum of tools for model building than just hypothesis tests. Best subsets regression algorithms allow for the quick summarization of hundreds or even thousands of potential regression models. The underlying principle of these summaries is the principle of parsimony, which implies the tradeoff of strength of fit versus simplicity: that a model should only be as complex as it needs to be. Measures such as the adjusted
A related point increasingly raised in recent years has been focused on issues of replicability, or the lack thereof — the alarming tendency for supposedly established relationships to not reappear as strongly (or at all) when new data are examined. Much of this phenomenon comes from quite valid attempts to find appropriate representations of relationships in a complicated world (including those discussed here and in the next three chapters), but that doesn't alter the simple fact that interacting with data to make models more appropriate tends to make things look stronger than they actually are. Replication and validation of models (and the entire model building process) should be a fundamental part of any exploration of a random process. Examining a problem further and discovering that a previously‐believed relationship does not replicate is not a failure of the scientific process; in fact, it is part of the essence of it.
A valid question regarding the logistics of performing model selection remains: what is the “correct” order in which to perform the different steps of model selection, assumption checking, and so on? Do you omit unusual observations first, and then try to determine the best model? Or do you work on variable selection, and then check diagnostics based on your chosen model?