Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis

Чтение книги онлайн.

Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 22

Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis

Скачать книгу

recent rapid development of practical methods, fundamental problems have been ignored and fundamental paradoxes left unresolved.

      (Fisher, 1922a, p. 310)

      Our statistics review includes topics that would customarily be seen in a first course in statistics at the undergraduate level, but depending on the given course and what was emphasized by the instructor, our treatment here may be at a slightly deeper level. We review these principles with demonstrations in R and SPSS where appropriate. Should any of the following material come across as entirely “new,” then a review of any introductory statistics text is recommended. For instance, Kirk (2008), Moore, McCabe, and Craig (2014), Box, Hunter, and Hunter (1978) are relatively nontechnical sources, whereas Degroot and Schervish (2002), Wackerly, Mendenhall III, and Scheaffer (2002) along with Evans and Rosenthal (2010) are much deeper and technically dense. Casella and Berger (2002), Hogg and Craig (1995) along with Shao (2003) are much higher‐level theoretically oriented texts targeted mainly at mathematical and theoretical statisticians. Other sources include Panik (2005), Berry and Lindgren (1996), and Rice (2006). For a lighter narrative on the role of statistics in social science, consult Abelson (1995).

      Because of its importance in the interpretation of evidence, we close the chapter with an easy but powerful demonstration of what makes a p‐value small or large in the context of statistical significance testing and the testing of null hypotheses. It is imperative that as a research scientist, you are knowledgeable of this material before you attempt to evaluate any research findings that employ statistical inference.

      When we speak of density as it relates to distributions in statistics, we are referring generally to theoretical distributions having area under their curves. There are numerous probability distributions or density functions. Empirical distributions, on the other hand, rarely go by the name of densities. They are in contrast “real” distributions of real empirical data. In some contexts, the identifier normal distribution may be given without reference as to whether one is referring to a density or to an empirical distribution. It is usually evident by the context of the situation which we are referring to. We survey only a few of the more popular densities and distributions in our discussion that follows.

      The univariate normal density is given by:

equation

      where,

       μ is the population mean for the given density,

       σ2 is the population variance,

       π is a constant equal to approximately 3.14,

       e is a constant equal to approximately 2.71,

       xi is a given value of the independent variable, assumed to be a real number.

      When μ is 0 and σ2 is 1, which implies that the standard deviation σ is also equal to 1 (i.e., images), the normal distribution is given a special name. It is called the standard normal distribution and can be written more compactly as:

      Scores in research often come in their own units, with distributions having means and variances different from 0 and 1. We can transform a score coming from a given distribution with mean μ and standard deviation σ by the familiar z‐score:

equation

      A z‐score is expressed in units of the standard normal distribution. For example, a z‐score of +1 denotes that the given raw score lay one standard deviation above the mean. A z‐score of −1 means that the given raw score lay one standard deviation below the mean. In some settings (such as school psychology), t‐scores are also useful, having a mean of 50 and standard deviation of 10. In most contexts, however, z‐scores dominate.

Graph depicts the standard normal distribution with shaded area from negative 1 to +1 standard deviations from the mean. equation

      Suppose that in Mary's class, the mean was equal to 65% also with a standard deviation of 2. Mary's z‐score is thus:

equation

      As we can see, relative to their particular distributions, Mary greatly outperformed John. Assuming each distribution is approximately normal, the density under the curve for a normal distribution with mean 0 and standard deviation of 1 at a score of 2.5 is:

      > dnorm(2.5, 0, 1) [1] 0.017528

      where dnorm is the density under the curve at 2.5. This is the value of f(x) at the score of 2.5. What then is the probability of scoring 2.5 or greater? To get the cumulative density up to 2.5, we compute:

      > pnorm(2.5, 0, 1) [1] 0.9937903

Скачать книгу