Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis

Чтение книги онлайн.

Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 51

Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis

Скачать книгу

with s2, she computes a one‐sample t‐test rather than a z‐test. Her computation of the ensuing t is:

equation

      On degrees of freedom equal to n − 1 = 100 – 1 = 99, for a two‐tailed test, we require a t statistic of ± 1.984 for the result to be statistically significant at a level of significance of 0.05. Hence, the obtained value of t = 1 is not statistically significant. That the result is not statistically significant is hardly surprising, since the sample mean of the psychologist's school is only 101, a single mean point higher than the national average of 100. It would seem then that the computation of t is telling us a story that is consistent with our intuition, that there is no reason to believe that the school's performance is higher than that of the national average in the population from which these sample data were drawn.

      Now, consider what would have happened had the psychologist collected a larger sample, suppose n = 500. Using our new sample size, and still assuming an estimated population standard deviation s equal to 10 and a distance between means equal to 1, we repeat the computation for t:

equation

      What happened? The obtained value of t increased from 1 to 2.22 simply as a result of collecting a larger sample, nothing more. The actual distance between means remained the same (101−100 = 1). The degrees of freedom for the test have changed and are now equal to 499 (i.e., n − 1 = 500 − 1 = 499). Since our obtained t of 2.22 exceeds critical t, our statistic is deemed statistically significant at p < 0.05. What is important to realize is that we did not change the difference between the sample mean images and the population mean μ0, it remained extremely small at only a single mean achievement point (i.e., 101 – 100 = 1). Even with the same distance between means, the obtained t of 2.22 and it being statistically significant at p < 0.05 now means we will reject the null hypothesis, and infer the alternative hypothesis that μμ0. And because scientists have historically considered the infamous statement “p < 0.05” to be automatically and necessarily equivalent to something meaningful or important, the obvious danger is that the rejection of the null hypothesis at p < 0.05 is considered by some (or even most) a “positive” result. When in reality, the difference, in this case, is nothing short of trivial.

      The problem is not that the significance test is not useful and therefore should be banned. The problem is that too few are aware that the statement “p < 0.05,” in itself, scientifically (as opposed to statistically) may have little meaning in a given research context, and at worst, may be entirely misleading if automatically assigned any degree of scientific importance by the interpreter.

      2.28.4 Other Test Statistics

      The factors that influence the size of a p‐value are, of course, not only relevant to z‐ and t‐tests, but are at work in essentially every test of statistical significance we might conduct. For instance, as we will see in the following chapter, the size of the F‐ratio in traditional one‐way ANOVA is subject to the same influences. Taken as the ratio of MS between to MS error, the three determining influences for the size of p are (1) size of MS between, which is a reflection of the extent to which means are different from group to group, (2) size of MS error, which is in part a reflection of the within‐group variability, and (3) sample size (when computing MS error, we divide the sum of squares for error by degrees of freedom, in which the degrees of freedom are determined in large part by sample size). Hence, a large F‐stat does not necessarily imply that MS between is absolutely large, no more than a large t necessarily implies the size of images. A small p‐value associated with a computed F could be a result of small within‐group variation and/or a large sample size. It does not necessarily mean that group‐to‐group mean differences are substantial, which was presumably the goal of the study or experiment by the investigator. That is, the goal was not to simply obtain small within‐group variation. The goal was to demonstrate mean differences between groups.

      These ideas for significance tests apply in even the most advanced of modeling techniques, such as structural equation modeling (see Chapter 15). The typical measure of model fit is the chi‐square statistic, χ2, which as reported by many (e.g., see Bollen, 1989; Hoelter, 1983) suffers the same interpretational problems as t and F regarding how its magnitude can be largely a function of sample size. That is, one can achieve a small or large χ2 simply because one has used a small or large sample. If a researcher is not aware of this fact, he or she may decide that a model is well‐fitting or poor‐fitting based on a small or large chi‐square value, without awareness of its connection with n. This is in part why other measures, as we will see, have been proposed for interpreting the fit of SEM models (e.g., see Browne and Cudeck, 1993).

      2.28.5 The Solution

      The solution to episodes of misunderstanding the significance test is not to drop or ban it, contrary to what some have recommended (e.g., Hunter, 1997). Rather, the solution is to supplement it with a measure that accounts for the actual distance between means and serves to convey the magnitude of the actual scientific finding, as opposed to statistical finding, should there be one. Measures of effect size, interpreted in conjunction with significance tests, help to communicate whether something has “happened” or “not happened” in the given study or experiment. The reader interested in effect sizes can turn to a multitude of sources (Cortina and Nouri, 1999; Rosenthal, Rosnow, and Rubin, 2000). For our purposes, it suffices to review the principle of an effect size measure rather than catalog the wealth of possibilities for effect sizes available. Perhaps the easiest and most straightforward way of conceptualizing an effect size is to consider a measure of standardized statistical distance, or Cohen's d, already featured in our computations of power.

      2.28.6 Statistical Distance: Cohen's d

      For a one‐sample z‐test, Cohen's d (Cohen, 1988) is defined as the absolute distance between the observed sample mean and the population mean under the null hypothesis, divided by the population standard deviation:

equation

      In the above, since images is serving as the estimate of μ, the numerator can also be given as μμ0. However, using images instead of μ above is a reminder of where this mean is coming from. It is coming from our sample data, and we wish to compare that sample mean to the population mean μ0 under the null hypothesis.

      As an example, where images, μ0 = 18, and σ = 2 Cohen's d is computed as:

equation

Скачать книгу