Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis

Чтение книги онлайн.

Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 49

Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis

Скачать книгу

is critical for the practitioner or researcher to appreciate if he or she is to assess and appraise statistical evidence in an intelligent and thoughtful manner. It is not an exaggeration to say that if one does not understand the make‐up of a p‐value and the factors that directly influence its size, one cannot properly evaluate statistical evidence, nor should one even make the attempt to do so. Though these arguments are not new and have been put forth by even the very best of methodologists (e.g., see Cohen, 1990; Meehl, 1978) there is evidence to suggest that many practitioners and researchers do not understand the factors that determine the size of a p‐value (Gigerenzer, 2004). To emphasize once again—understanding the determinants of a p‐value and what makes p‐values distinct from effect sizes is not simply “fashionable.” Rather, it is absolutely mandatory for any attempt to properly evaluate statistical evidence in a research report. Does the paper you're reading provide evidence of a successful treatment for cancer? If you do not understand the distinctions between p‐values and effect sizes, you will be unable to properly assess the evidence. It is that important. As we will see, stating a result as “statistically significant” does not in itself tell you whether the treatment works or does not work, and in some cases, tells you very little at all from a scientific vantage point.

      2.28.1 Null Hypothesis Significance Testing (NHST): A Legacy of Criticism

      Criticisms targeted against null hypothesis significance testing have inundated the literature since at least the time Berkson in 1938 brought to light how statistical significance can be easily achieved by simple manipulations of sample size:

      I believe that an observant statistician who has had any considerable experience with applying the chi‐square test repeatedly will agree with my statement that, as a matter of observation, when the numbers in the data are quite large, the P's tend to come out small. (p. 526)

      Since Berkson, the very best and renown of methodologists have remarked that the significance test is subject to gross misunderstanding and misinterpretation (e.g., see Bakan, 1966; Carver, 1993; Cohen, 1990; Estes, 1997; Loftus, 1991; Meehl, 1978; Oakes, 1986; Shrout, 1997; Wilson, Miller, and Lower, 1967). And though it can be difficult to assess or evaluate whether the situation has improved, there is evidence to suggest that it has not. Few describe the problem better than Gigerenzer in his article Mindless statistics (Gigerenzer, 2004), in which he discusses both the roots and truths of hypothesis testing, as well as how its “statistical rituals” and practices have become far more of a sociological phenomenon rather than anything related to good science and statistics.

      Recall the familiar one‐sample z‐test for a mean discussed earlier:

equation

      where the purpose of the test was to compare an obtained sample mean images to a population mean μ0 under the null hypothesis that μ = μ0. Sigma, σ, recall is the standard deviation of the population from which the sample was presumably drawn. Recall that in practice, this value is rarely if ever known for certain, which is why in most cases an estimate of it is obtained in the form of a sample standard deviation s. What determines the size of zM, and therefore, the smallness of p? There are three inputs that determine the size of p, which we have already featured in our earlier discussion of statistical power. These three factors are images, σ and n. We consider each of these once more, then provide simple arithmetic demonstrations to emphasize how changing any one of these necessarily results in an arithmetical change in zM, and consequently, a change in the observed p‐value.

      As a first case, consider the distance images. Given constant values of σand n, the greater the distance between imagesand μ0, the larger zMwill be. That is, as the numerator images grows larger, the resulting zM also gets larger in size, which as a consequence, decreases p in size. As a simple example, assume for a given research problem that σ is equal to 20 and n is equal to 100. This means that the standard error is equal to 20/images, which is equal to 20/10 = 2. Suppose the obtained sample mean images were equal to 20, and the mean under the null hypothesis, μ0, were equal to 18. The numerator of zM would thus be 20 – 18 = 2. When 2 is divided by the standard error of 2, we obtain a value for zM of 1.0, which is not statistically significant at p < 0.05.

      Now, consider the scenario where the standard error of the mean remains the same at 2, but that instead of the sample mean images being equal to 20, it is equal to 30. The difference between the sample mean and the population mean is thus 30 – 18 = 12. This difference represents a greater distance between means, and presumably, would be indicative of a more “successful” experiment or study. Dividing 12 by the standard error of 2 yields a zM value of 6.0, which is highly statistically significant at p < 0.05 (whether for a one‐ or two‐tailed test).

      Having the value of zM increase as a result of the distance between images and μ0 increasing is of course what we would expect from a test statistic if that test statistic is to be used in any sense to evaluate the strength of the scientific evidence against the null. That is, if our obtained sample mean images turns out to be very different than the population mean under the null hypothesis, μ0, we would hope that our test statistic would measure this effect, and allow us to reject the null hypothesis at some preset significance level (in our example, 0.05). If interpreting test statistics were always as easy as this, there would be no misunderstandings about the meaning of statistical significance and the misguided decisions to automatically attribute “worth” to the statement “p < 0.05.” However, as we discuss in the following cases, there are other ways to make zM big or small that do not depend so intimately on the distance between images and μ0, and this is where interpretations of the significance test usually run awry.

Скачать книгу