Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis
Чтение книги онлайн.
Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 20
Figure 1.5 The “pilot criterion” must be met for any pilot to be permitted to fly your plane. However, of those skilled enough to fly, your pilot may still lay at the lower end of the curve. That is, your pilot may be absolutely good, but relatively poor in terms of skill.
1.9 EXPERIMENTAL VERSUS STATISTICAL CONTROL
Perhaps most pervasive in the social science literature is the implicit belief held by many that methods such as regression and analysis of covariance allow one to “control” variables that would otherwise not be controllable in the nonexperimental design. As is emphasized throughout this book, statistical methods, whatever the kind, do not provide methods of controlling variables, or “holding variables constant” as it were. Not in the real way. To get these kinds of effects, you usually need a strong and rigorous bullet‐proof experimental design.
It is true, however, that statistical methods do afford a method, in some sense, for presuming (or guessing) what might have been had controls been put into place. For instance, if we analyze the correlation between weight and height, it may make sense to hold a factor such as age “constant.” That is, we may wish to partial out age. However, partialling out the variability due to age in the bivariate correlation is not equivalent to actually controlling for age. The truth of the matter is that our statistical control is telling us nothing about what would actually be the case had we been able to truly control age, or any other factor. As will be elaborated on in Chapter 8 on multiple regression, statistical control is not a sufficient “proxy” whatsoever for experimental control. Students and researchers must keep this distinction in mind before they throw variables into a statistical model and employ words like “control” (or other power and action words) when interpreting effects. If you want to truly control variables, to actually hold them constant, you usually have to do experiments. Estimating parameters in a statistical model, confident that you have “controlled” for covariates, is simply not enough.
1.10 STATISTICAL VERSUS PHYSICAL EFFECTS
In the establishment of evidence, either experimental or nonexperimental, it is helpful to consider the distinction between statistical versus physical effects. To illustrate, consider a medical scientist who wishes to test the hypothesis that the more medication applied to a wound, the faster the wound heals. The statistical question of interest is—Does amount of medication predict the rate at which a wound heals? A useful statistical model might be a linear regression where amount of medication is the predictor and rate of healing is the response. Of course, one does not “need” a regression analysis to “know” whether something is occurring. The investigator can simply observe whether the wound heals or not, and whether applying more or less medication speeds up or slows down the healing process. The statistical tool in this case is simply used to model the relationship, not determine whether or not it exists. The variable in question is a physical, biological, “real” phenomenon. It exists independent of the statistical model, simply because we can see it. The estimation of a statistical model is not necessarily the same as the hypothesized underlying physical process it is seeking to represent.
In some areas of social science, however, the very observance of an effect cannot be realized without recourse to the statistics used to model the relationship. For instance, if I correlate self‐esteem to intelligence, am I modeling a relationship that I know exists separate from the statistical model, or, is the statistical model the only recourse I have to say that the relationship exists in the first place? Because of mediating and moderating relationships in social statistics, an additional variable or two could drastically modify existing coefficients in a model to the point where predictors that had an effect before such inclusion no longer do after. As we will emphasize in our chapters on regression:
When you change the model, you change parameter estimates, you change effects. You are never, ever, testing individual effects in the model. You are always testing the model, and hence the interpretation of parameter estimates must be within the context of the model.
This is one of the general problems of purely correlational research with nonphysical or “nonorganic” variables. It may be more an exercise in variance partitioning than it is in analyzing “true” substantive effects, since the effects in question may be simply statistical artifacts. They may have little other bases. Granted, even working with physical or biological variables this can be a problem, but it does not rear its head nearly as much. To reiterate, when we model a physical relationship, we have recourse to that physical relationship independent of the statistical model, because we have evidence that the physical relationship exists independent of the model. If we lost our modeling software, we could still “see” the phenomenon. In many models of social phenomena, however, the addition of one or two covariates in the model can make the relationship of most interest “disappear” and because of the nature of measured variables, we may no longer have physical recourse to justify the original relationship at all, external to the statistical model. This is why social models can be very “neurotic,” frustrating, and context‐dependent. Self‐esteem may predict achievement in one model, but in another, it does not. Many areas of psychological, political, and economic research, for instance, implicitly operate on such grounds. The existence of phenomena is literally “built” on the existence of the statistical model and often does not necessarily exist separate from it, or at least not in an easily observed manner such as the healing of a wound. Social scientists working in such areas, if nothing else, must be aware of this. Estimating a statistical model may or may not correspond to actual physical effects it is seeking to account for.
1.11 UNDERSTANDING WHAT “APPLIED STATISTICS” MEANS
In this day and age of extraordinary computing power, the likes of which will probably seem laughable in even a decade from the date of publication of this book, with a few clicks of the mouse and a software manual, one can obtain a principal components analysis, factor analysis, discriminant analysis, multiple regression, and a host of other relatively theoretically advanced statistical techniques in a matter of seconds. The advance of computers and especially easy‐to‐use software programs has made performing statistical analyses seemingly quite easy because even a novice can obtain output from a statistical procedure relatively quickly. One consequence of this however is that there seems to have arisen a misunderstanding in some circles that “applied statistics” somehow equates with the idea of “statistics without mathematics” or even worse, “statistics via software.”
The word “applied” in applied statistics should not be understood to necessarily imply the use of computers. What “applied” should mean is that the focus on the writing is on how to use statistics in the context of scientific investigation, oftentimes with demonstrations with real or hypothetical data. Whether that data is analyzed “by hand” or through the use of software does not make one approach more applied than the other. If analyzed via computer, what it does make it is more computational compared to the by‐hand approach. Indeed, there is a whole field of study known as computational statistics that features a variety of software