Handbook of Regression Analysis With Applications in R. Samprit Chatterjee

Чтение книги онлайн.

Читать онлайн книгу Handbook of Regression Analysis With Applications in R - Samprit Chatterjee страница 10

Handbook of Regression Analysis With Applications in R - Samprit  Chatterjee

Скачать книгу

service of the ultimate goal of analyzing real data using appropriate regression tools. As such, the target audience of the book includes anyone who is faced with regression data [that is, data where there is a response variable that is being modeled as a function of other variable(s)], and whose goal is to learn as much as possible from that data.

      The book can be used as a text for an applied regression course (indeed, much of it is based on handouts that have been given to students in such a course), but that is not its primary purpose; rather, it is aimed much more broadly as a source of practical advice on how to address the problems that come up when dealing with regression data. While a text is usually organized in a way that makes the chapters interdependent, successively building on each other, that is not the case here. Indeed, we encourage readers to dip into different chapters for practical advice on specific topics as needed. The pace of the book is faster than might typically be the case for a text. The coverage, while at an applied level, does not shy away from sophisticated concepts. It is distinct from, for example, Chatterjee and Hadi (2012), while also having less theoretical focus than texts such as Greene (2011), Montgomery et al. (2012), or Sen and Srivastava (1990).

      This, however, is not a cookbook that presents a mechanical approach to doing regression analysis. Data analysis is perhaps an art, and certainly a craft; we believe that the goal of any data analysis book should be to help analysts develop the skills and experience necessary to adjust to the inevitable twists and turns that come up when analyzing real data.

      Each chapter of the book is laid out in a similar way, with most having at least four sections of specific types. First is an introduction, where the general issues that will be discussed in that chapter are presented. A section on concepts and background material follows, where a discussion of the relationship of the chapter's material to the broader study of regression data is the focus. This section also provides any theoretical background for the material that is necessary. Sections on methodology follow, where the specific tools used in the chapter are discussed. This is where relevant algorithmic details are likely to appear. Finally, each chapter includes at least one analysis of real data using the methods discussed in the chapter (as well as appropriate material from earlier chapters), including both methodological and graphical analyses.

      The book begins with discussion of the multiple regression model. Many regression textbooks start with discussion of simple regression before moving on to multiple regression. This is quite reasonable from a pedagogical point of view, since simple regression has the great advantage of being easy to understand graphically, but from a practical point of view simple regression is rarely the primary tool in analysis of real data. For that reason, we start with multiple regression, and note the simplifications that come from the special case of a single predictor. Chapter 1 describes the basics of the multiple regression model, including the assumptions being made, and both estimation and inference tools, while also giving an introduction to the use of residual plots to check assumptions.

      Since it is unlikely that the first model examined will ultimately be the final preferred model, Chapter 2 focuses on the very important areas of model building and model selection. This includes addressing the issue of collinearity, as well as the use of both hypothesis tests and information measures to help choose among candidate models.

      Chapters 3 through 5 study common violations of regression assumptions, and methods available to address those model violations. Chapter 3 focuses on unusual observations (outliers and leverage points), while Chapter 4 describes how transformations (especially the log transformation) can often address both nonlinearity and nonconstant variance violations. Chapter 5 is an introduction to time series regression, and the problems caused by autocorrelation. Time series analysis is a vast area of statistical methodology, so our goal in this chapter is only to provide a good practical introduction to that area in the context of regression analysis.

      Chapters 8 though 10 examine the situation where the nature of the response variable is such that Gaussian‐based least squares regression is no longer appropriate. Chapter 8 focuses on logistic regression, designed for binary response data and based on the binomial random variable. While there are many parallels between logistic regression analysis and least squares regression analysis, there are also issues that come up in logistic regression that require special care. Chapter 9 uses the multinomial random variable to generalize the models of Chapter 8 to allow for multiple categories in the response variable, outlining models designed for response variables that either do or do not have ordered categories. Chapter 10 focuses on response data in the form of counts, where distributions like the Poisson and negative binomial play a central role. The connection between all these models through the generalized linear model framework is also exploited in this chapter.

      The final chapter focuses on situations where linearity does not hold, and a nonlinear relationship is necessary. Although these models are based on least squares, from both an algorithmic and inferential point of view there are strong connections with the models of Chapters 8 through 10, which we highlight.

      This Handbook can be used in several different ways. First, a reader may use the book to find information on a specific topic. An analyst might want additional information on, for example, logistic regression or autocorrelation. The chapters

Скачать книгу