Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis

Чтение книги онлайн.

Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 40

Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis

Скачать книгу

developed the coefficient in 1904,9is a correlation coefficient suitable for data on two variables that are expressed in terms of ranks rather than actual measurements on a continuous scale. Mathematically, the Spearman correlation coefficient is equivalent to a Pearson r when the data are ranked. There are important differences between these two coefficients. Spearman's rs can be defined as:

equation

      where Rx and Ry are the ranks on xi and yi for the ith individual in the data, images are squared rank deviations, and n is the number of pairs of ranks (Kirk, 2008). When we compute rs on the Galton data, we obtain:

      > cor.test(parent, child, method = "spearman") Spearman's rank correlation rho data: parent and child S = 76569964, p-value < 2.2e-16 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.4251345

      We see that rs of 0.425 is slightly less than was Pearson r of 0.459.

      From the table, we can see that Bill very much favors Star Wars (rating of 10) while least likes Batman (rating of 2.1). Mary's favorite movie is Scarface (rating of 9.7) while her least favorite movie is Batman (rating of 7.6). We will refer to these subjective scores in a moment. For now, we focus only on the ranks. For instance, Bill's ranking of Scarface is third, while Mary's ranking of Star Wars is third.

Movie Bill Mary
Batman 5 (2.1) 5 (7.6)
Star Wars 1 (10.0) 3 (9.0)
Scarface 3 (8.4) 1 (9.7)
Back to the Future 4 (7.6) 4 (8.5)
Halloween 2 (9.5) 2 (9.6)

      Actual scores on the favorability measure are in parentheses.

      > bill <- c(5, 1, 3, 4, 2) > mary <- c(5, 3, 1, 4, 2)

      Because the data are already in the form of ranks, both Pearson r and Spearman rho will agree:

      > cor(bill, mary) [1] 0.6 > cor(bill, mary, method = “spearman”) > 0.6

      Note that by default, R returns the Pearson correlation coefficient. One has to specify method = “spearman” to get rs. Consider now what happens when we correlate, instead of rankings, the actual subjective favorability scores corresponding to the respective ranks. When we plot the favorability data, we obtain:

      > bill.sub <- c(2.1, 7.6, 8.4, 9.5, 10.0) > mary.sub <- c(7.6, 8.5, 9.0, 9.6, 9.7) > plot(mary.sub, bill.sub)Graph depicts the plot of mary.xub versus bill.xub.

      Note that though the relationship is not perfectly linear, each increase in Bill's subjective score is nonetheless associated with an increase in Mary's subjective score. When we compute Pearson's r on this data, we obtain:

      > cor(bill.sub, mary.sub) [1] 0.9551578

      However, when we compute rs, we get:

      > cor(bill.sub, mary.sub, method = "spearman") [1] 1

      The density for Student's t is given by (Shao, 2003):

equation

      The fact that t converges to z for large degrees of freedom but is quite distinct from z for small degrees of freedom is one reason why t distributions are often used for small sample problems. When sample size is large, and so consequently are degrees of freedom, whether one treats a random variable as t or z will make little difference in terms of computed p‐values and decisions on respective null hypotheses. This is a direct consequence of the convergence of the two distributions for large degrees of freedom. For a historical overview of how t‐distributions came to be, consult Zabell (2008).

      2.20.1 t‐Tests for One Sample

      When we perform hypothesis testing using the z distribution, we assume we have knowledge of the population variance σ2. Having direct knowledge of σ2 is the most ideal and preferable of circumstances. When we know σ2, we can compute the standard error of the mean directly as

equation Graph depicts student's t versus normal densities for 3 (left), 10 (middle), and 50 (right) degrees of freedom. As degrees of freedom increase, the limiting form of the t distribution is 
						<noindex><p style= Скачать книгу