Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis

Чтение книги онлайн.

Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 52

Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis

Скачать книгу

small, medium, and large effects respectively (Cohen, 1988). However, relying on effect size guidelines to indicate the absolute size of an experimental or nonexperimental effect should only be done in the complete and absolute absence of all other information for the research area. In the end, it is the researcher, armed with knowledge of the history of the phenomenon under study, who must evaluate whether an effect is small or large. For instance, referring to the achievement example discussed earlier, Cohen's d would be equal to:

equation

      The effect size of 0.1 is small according to Cohen's guidelines, but more importantly, also small substantively, since a difference in means of 1 point is, by all accounts, likely trivial. In this case, both Cohen's guidelines and the actual substantive evaluation of the size of effect coincide. However, this is not always the case. In physical or biological experiments, for instance, one can easily imagine examples for which an effect size of even 0.8 might be considered “small” relative to the research area under investigation, since the degree of control the investigator can impose over his or her subjects is much greater. In such cases, it may very well be that Cohen's d values in the neighborhood of two or three would be required for an effect to be considered “large.” The point is that only in the complete absence of information regarding an area of investigation is it appropriate to use “rules of thumb” to evaluate the size of effect. Cohen's d, or effect size measures in general, should always be used in conjunction with statements of statistical significance, since they tell the researcher what she is actually wanting to know, that of the estimated separation between sample data (often in the form of a sample mean) and the null hypothesis under investigation. Oftentimes meta‐analysis, which is a study of the overall measure of effect for a given phenomenon, can be helpful in comparing new research findings to the “status quo” in a given field. For a thorough user‐friendly overview of the methodology, consult Shelby and Vaske (2008).

      2.28.7 What Does Cohen's d Actually Tell Us?

      Writing out a formula and plugging in numbers, unfortunately, does not necessarily give us a feeling for what the formula actually means. This is especially true with regard to Cohen's d. We now discuss the statistic in a bit more detail, pointing out why it is usually interpreted as the standardized difference between means.

      Imagine you have two independent samples of laboratory rats. To one sample, you provide normal feeding and observe their weight over the next 30 days. To the other sample, you also feed normally, but also give them regular doses of a weight‐loss drug. You are interested in learning whether your weight‐loss drug works or not. Suppose that after 30 days, on average, a mean difference of 0.2 pounds is observed between groups. How big is a difference of 0.2 pounds for these groups? If the average difference in weight among rats in the population were very large, say, 0.8 pounds, then a mean difference of 0.2 pounds is not that impressive. After all, if rats weigh very differently from one rat to the next, then really, finding a mean difference of 0.2 between groups cannot be that exciting. However, if the average weight difference between rats were equal to 0.1 pounds, then all of a sudden, a mean difference of 0.2 pounds seems more impressive, because that size of difference is atypical relative to the population. What is “typical?” This is exactly what the standard deviation reveals. Hence, when we are computing Cohen's d, we are in actuality producing a ratio of one deviation relative to another, similar to how when we compute a z‐score, we are comparing the deviation of yμ with the standard deviation σ. The extent to which observed differences are large relative to “average” differences will be the extent to which d will be large in magnitude.

      2.28.8 Why and Where the Significance Test Still Makes Sense

      At this point, the conscientious reader may very well be asking the following question: If the significance test is so misleading and subject to misunderstanding and misinterpretation, how does it even make sense as a test of anything? It would appear to be a nonsensical test and should forever be forgotten. The fact is that the significance test does make sense, only that the sense that it makes is not necessarily always scientific. Rather, it is statistical. To a pure theoretical statistician or mathematician, a decreasing p‐value as a function of an increasing sample size makes perfect sense—as we snoop a larger part of the population, the random error we expect typically decreases, because with each increase in sample size we are obtaining a better estimate of the true population parameter. Hence, that we achieve statistical significance with a sample size of 500 and not 100, for instance, is well within that of statistical “good sense.” That is, the p‐value is functioning as it should, and likewise yielding the correct statistical information.

      However, statistical truth does not equate to scientific truth (Bolles, 1962). Statistical conclusions should never be automatically equated with scientific ones. They are different and distinct things. When we arrive at a statistical conclusion (e.g., when deciding to reject the null hypothesis), one can never assume that this represents anything that is necessarily or absolutely scientifically meaningful. Rather, the statistical conclusion should be used as a potential indicator that something scientifically interesting may have occurred, the evidence for which must be determined by other means, which includes effect sizes, researcher judgment, and putting the obtained result into its proper interpretive context.

       To understand advanced statistical procedures, it is necessary to have a firm grasp on the foundations of introductory statistics. Advanced procedures are typically extensions of first principles.

       Densities are theoretical probability distributions. The normal univariate density is an example.

       The standard normal distribution has a mean μ of 0 and a variance σ2 of 1.

        z‐scores are useful for comparing raw scores emanating from different distributions. Standardization transforms raw scores to a common scale, allowing for comparison between scores.

       Binomial distributions are useful in modeling experiments in which the outcome can be conceptualized as a “success” or “failure.” The outcome of the experiment must be binary in nature for the binomial distribution to apply.

       The normal distribution can be used to approximate the binomial distribution. In this regard, we say that the limiting form of the binomial distribution is the normal distribution.

       The bivariate normal density expresses the probability of the joint occurrence of two variables.

       The multivariate normal density expresses the probability of the joint occurrence of three or more variables.

       The mean, variance, skewness, and kurtosis are all moments of a distribution.

       The mean (arithmetic), the first moment of a distribution, either of a mathematical variable or a random variable, can be regarded as the center of gravity of the distribution such that the sum of deviations from the mean for any distribution is equal to zero.

       The variance, the second moment of a distribution, can be computed for either a mathematical variable or a random variable. It expresses the degree to which scores, on average, deviate from the mean in squared units.

       The sample variance with n in the denominator is biased. To correct for the

Скачать книгу