Handbook of Regression Analysis With Applications in R. Samprit Chatterjee

Чтение книги онлайн.

Читать онлайн книгу Handbook of Regression Analysis With Applications in R - Samprit Chatterjee страница 11

Handbook of Regression Analysis With Applications in R - Samprit  Chatterjee

Скачать книгу

(and other) topics provide the reader with this subject matter information. As noted above, the chapters also include at least one analysis of a data set, a clarification of computer output, and reference to sources where additional material can be found. The chapters in the book are to a large extent self‐contained and can be consulted independently of other chapters.

      The book can also be used as a template for what we view as a reasonable approach to data analysis in general. This is based on the cyclical paradigm of model formulation, model fitting, model evaluation, and model updating leading back to model (re)formulation. Statistical significance of test statistics does not necessarily mean that an adequate model has been obtained. Further analysis needs to be performed before the fitted model can be regarded as an acceptable description of the data, and this book concentrates on this important aspect of regression methodology. Detection of deficiencies of fit is based on both testing and graphical methods, and both approaches are highlighted here.

      SAMPRIT CHATTERJEE

      Brooksville, Maine

      JEFFREY S. SIMONOFF

      New York, New York

      August, 2012

PART ONE The Multiple Linear Regression Model

      1  1.1 Introduction

      2  1.2 Concepts and Background Material 1.2.1 The Linear Regression Model 1.2.2 Estimation Using Least Squares 1.2.3 Assumptions

      3  1.3 Methodology 1.3.1 Interpreting Regression Coefficients 1.3.2 Measuring the Strength of the Regression Relationship 1.3.3 Hypothesis Tests and Confidence Intervals for β 1.3.4 Fitted Values and Predictions 1.3.5 Checking Assumptions Using Residual Plots

      4  1.4 Example—Estimating Home Prices

      5  1.5 Summary

      This is a book about regression modeling, but when we refer to regression models, what do we mean? The regression framework can be characterized in the following way:

      1 We have one particular variable that we are interested in understanding or modeling, such as sales of a particular product, sale price of a home, or voting preference of a particular voter. This variable is called the target, response, or dependent variable, and is usually represented by .

      2 We have a set of other variables that we think might be useful in predicting or modeling the target variable (the price of the product, the competitor's price, and so on; or the lot size, number of bedrooms, number of bathrooms of the home, and so on; or the gender, age, income, party membership of the voter, and so on). These are called the predicting, or independent variables, and are usually represented by , , etc.

      Typically, a regression analysis is used for one (or more) of three purposes:

      1 modeling the relationship between and ;

      2 prediction of the target variable (forecasting);

      3 and testing of hypotheses.

      In this chapter, we introduce the basic multiple linear regression model, and discuss how this model can be used for these three purposes. Specifically, we discuss the interpretations of the estimates of different regression parameters, the assumptions underlying the model, measures of the strength of the relationship between the target and predictor variables, the construction of tests of hypotheses and intervals related to regression parameters, and the checking of assumptions using diagnostic plots.

      1.2.1 THE LINEAR REGRESSION MODEL

      The data consist of

observations, which are sets of observed values
that represent a random sample from a larger population. It is assumed that these observations satisfy a linear relationship,

      where the

coefficients are unknown parameters, and the
are random error terms. By a linear model, it is meant that the model is linear in the parameters; a quadratic model,

      paradoxically enough, is a linear model, since

and
are just versions of
and
.

      It is important to recognize that this, or any statistical model, is not viewed as a true representation of reality; rather, the goal is that the model be a useful

Скачать книгу