Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis
Чтение книги онлайн.
Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 16
![Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis](/cover_pre928329.jpg)
Figure 1.2 “Model fit” as an overlap of data with theory.
We must also ensure that our theories are not too convenient of narratives fit to data. If you have ever witnessed a sporting event where the deciding point occurred by the lucky bounce of a puck in hockey or the breezy push of a tennis ball in midair, only to hear post‐match commentators laud the winning team or individual as suddenly so much better than the losing team, then you know what “convenient narratives” are all about. We must be careful not to exaggerate how well our given theory fits data simply because a few data points went “our way.” George Box once said that all models are wrong but some are useful. In any scientific endeavor, guard against falling in love with your theory or otherwise exaggerating it far beyond what the data suggest. Otherwise, it no longer is a legitimate theory, but rather is simply your brand and more a product of subjective bias and “career‐building” than anything scientific. After 20 years of advocating a theory, is the researcher you are speaking to really prepared to “accept” evidence that contradicts his or her theory? They have a lot of stakes in that theory, their whole career may have been built upon it, are they really willing to accept “defeat” of it? Indeed, one reason I believe why economic predictions, for instance, are often looked upon with suspicion, is because economists, like psychologists (and theoretical physicists, for that matter), are far too quick to advance theories as though they were near facts. “Sexy theories” sound great and may be marketable to uncritical consumers and media (make an outlandish claim on cable, you'll be a hero!), but to good scientists, theories are always only as good as the data that exist to support them. Science is exciting, to be sure, but should not be overly speculative. If you are looking for fireworks, then you are best to choose a field other than science.
1.2 WHAT IS A “MODEL”?
The word “model” is perhaps the most popular word featured in textbooks, tutorials, and lectures having anything to do with the application of quantitative methods. Attempting to define just what is a model in statistics can be a bit challenging. We discuss the concept by referring to Everitt's definition:
A description of the assumed structure of a set of observations that can range from a fairly imprecise verbal account to, more usually, a formalized mathematical expression of the process assumed to have generated the observed data. The purpose of such a description is to aid in understanding the data.
(Everitt, 2002, p. 247)
Models, are, essentially, and perhaps somewhat crudely, equations. They are equations fit to data that attempt to account for how the data came about or were generated in the first place. For example, if for every hour a student studied for an exam corresponded to exactly a 1‐point increase in a student's grade, the model that would best explain how this data was generated would be a linear model. Even if the relationship between hours studied and student grade was not perfect, a perfect line might still be the “best” summary. Models are often used to account for messy or imperfect data.
Figure 1.3 Hebbian Yerkes–Dodson performance–arousal curve.
Source: Diamond et al. (2007). Licensed under CC by 3.0.
Another example of a model is the classic Hebbian version of the Yerkes–Dodson curve expressing the relationship between performance and arousal, depicted in Figure 1.3.
The curve is an inverted “U” shape (an approximate parabola) that provides a useful model relating these two attributes (i.e., performance and arousal). If one exhibits very low arousal, performance will be minimal. If one exhibits a very high degree of arousal, performance will likely also suffer. However, if one exhibits a moderate range of arousal, performance will likely be optimal. The model in this case, as in most cases, does not account for all the data one might collect. The extent to which it accounts for most of the data is the extent to which the model may be, in general, deemed “useful.” The use of a model is also enhanced if it can make accurate predictions of future behavior.
As another example of a model, consider the number of O‐ring incidents on NASA's space shuttle (the fleet is officially, and sadly, retired now) as a function of temperature (Figure 1.4). At very low or high temperatures, the number of incidents appears to be elevated. A square function seems to adequately model the relationship. Does it account for all points? No. But nonetheless, it provides a fairly good summary of the available data. Some have argued that had NASA had such a model (i.e., essentially the line joining the points) available before Challenger was launched on January 28, 1986, the launch may have been delayed and the shuttle and crew saved from disaster.2 We feature this data in our chapter on logistic regression.
Figure 1.4 Number of O‐ring incidents on boosters as a function of temperature.
Why did George Box say that all models are wrong, some are useful? The reason is that even if we obtain a perfectly fitting model, there is nothing to say that this is the only model that will account for the observed data. Some, such as Fox (1997), even encourage divorcing statistical modeling as accounting for deterministic processes. In discussing the determinants of one's income, for instance, Fox remarks:
I believe that a statistical model cannot, and is not literally meant to, capture the social process by which incomes are “determined” … No regression model, not even one including a residual, can reproduce this process … The unfortunate tendency to reify statistical models – to forget that they are descriptive summaries, not literal accounts of social processes – can only serve to discredit quantitative data analysis in the social sciences. (p. 5)
Indeed, psychological theory, for instance, has advanced numerous models of behavior just as biological theory has advanced numerous theories of human functioning. Two or more competing models may each explain observed data quite well. Sometimes, and unfortunately, the model we adopt may have more to do with our sociological (and even political) preferences than anything to do with whether one is more “correct” than the other. Science (and mathematics, for that matter) is a human activity, and often theories that are deemed valid or true have much to do with the spirit of the times (the so‐called Zeitgeist) and what the scientific community will actually accept and tolerate as being true.3 Of course, this is not true in all circumstances, but you should