Handbook of Regression Analysis With Applications in R. Samprit Chatterjee

Чтение книги онлайн.

Читать онлайн книгу Handbook of Regression Analysis With Applications in R - Samprit Chatterjee страница 12

Handbook of Regression Analysis With Applications in R - Samprit  Chatterjee

Скачать книгу

representation of reality. A model can be used to explore the relationships between variables and make accurate forecasts based on those relationships even if it is not the “truth.” Further, any statistical model is only temporary, representing a provisional version of views about the random process being studied. Models can, and should, change, based on analysis using the current model, selection among several candidate models, the acquisition of new data, new understanding of the underlying random process, and so on. Further, it is often the case that there are several different models that are reasonable representations of reality. Having said this, we will sometimes refer to the “true” model, but this should be understood as referring to the underlying form of the currently hypothesized representation of the regression relationship.

.

corresponds to the simple regression model, and is consistent with the representation in Figure 1.1. The solid line is the true regression line, the expected value of
given the value of
. The dotted lines are the random errors
that account for the lack of a perfect association between the predictor and the target variables.

      1.2.2 ESTIMATION USING LEAST SQUARES

      The true regression function represents the expected relationship between the target and the predictor variables, which is unknown. A primary goal of a regression analysis is to estimate this relationship, or equivalently, to estimate the unknown parameters

. This requires a data‐based rule, or criterion, that will give a reasonable estimate. The standard approach is least squares regression, where the estimates are chosen to minimize

, the estimated expected response value given the observed predictor values equals

      and is called the fitted value. The difference between the observed value

and the fitted value
is called the residual, the set of which is represented by the signed lengths of the dotted lines in Figure 1.2. The least squares regression line minimizes the sum of squares of the lengths of the dotted lines; that is, the ordinary least squares (OLS) estimates minimize the sum of squares of the residuals.

), the true and estimated regression relationships correspond to planes (
) or hyperplanes (
), but otherwise the principles are the same. Figure 1.3 illustrates the case with two predictors. The length of each vertical line corresponds to a residual (solid lines refer to positive residuals, while dashed lines refer to negative residuals), and the (least squares) plane that goes through the observations is chosen to minimize the sum of squares of the residuals.

      The regression model (1.1) is then

      (1.3)

      which implies that the least squares estimates satisfy

      (1.4)

      The fitted values are then

      (1.5)

Скачать книгу