Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis
Чтение книги онлайн.
Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 42
The actual mean difference observed is equal to 2.60, which was computed by taking the mean of our sample, that of 102.6 and subtracting the mean hypothesized under the null hypothesis, that of 100 (i.e., 102.6 – 100 = 2.60).
The 95% confidence interval of the difference is interpreted to mean that with 95% confidence, the interval with lower bound −4.8810 and upper bound 10.0810 will capture the true parameter, which in this case is the population mean difference. We can see that 0 lies within the limits of the confidence interval, which again confirms why we were unable to reject the null hypothesis at the 0.05 level of significance. Had zero lay outside of the confidence interval limits, this would have been grounds to reject the null at a significance level of 0.05 (and consequently, we would have also obtained a p‐value of less than 0.05 for our significance test). Recall that the true mean (i.e., parameter) is not the random component. Rather, the sample is the random component, on which the interval is then computed. It is important to emphasize this distinction when interpreting the confidence interval.
We can easily generate the same t‐test in R. We first generate the vector of data then carry on with the one‐sample t‐test, which we notice mirrors the findings obtained in SPSS:
> iq <- c(105, 98, 110, 105, 95) > t.test(iq, mu = 100) One Sample t-test data: iq t = 0.965, df = 4, p-value = 0.3892 alternative hypothesis: true mean is not equal to 100 95 percent confidence interval: 95.11904 110.08096 sample estimates: mean of x 102.6
2.20.2 t‐Tests for Two Samples
Just as the t‐test for one sample is a generalization of the z‐test for one sample, for which we use s2 in place of σ2, the t‐test for two independent samples is a generalization of the z‐test for two independent samples. Recall the z‐test for two independent samples:
where
When we do not know the population variances
on degrees of freedom v = n1 − 1 + n2 − 1 = n1 + n2 − 2.
The formulization of t in (2.6) assumes that n1 = n2. If sample sizes are unequal, then pooling variances is recommended. To pool, we weight the sample variances by their respective sample sizes and obtain the following estimated standard error of the difference in means:
which can also be written as
Notice that the pooled estimate of the variance
2.20.3 Two‐Sample t‐Tests in R
Consider the following hypothetical data on pass‐fail grades (“0” is fail, “1” is pass) for a seminar course with 10 attendees:
grade studytime 0 30 0 25 0 59 0 42 0 31 1 140 1 90 1 95 1 170 1 120
To conduct the two‐sample t‐test, we generate the relevant vectors in R then carry out the test:
> grade.0 <- c(30, 25, 59, 42, 31) > grade.1 <- c(140, 90, 95, 170, 120) > t.test(grade.0, grade.1) Welch Two Sample t-test data: grade.0 and grade.1 t = -5.3515, df = 5.309, p-value = 0.002549 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -126.00773 -45.19227 sample estimates: mean of x mean of y 37.4 123.0
Using a Welch adjustment for unequal variances (Welch, 1947) automatically generated by R, we conclude a statistically significant difference between means (p = 0.003). With 95% confidence, we can say the true mean difference lies between the lower limit of approximately −126.0 and the upper limit of approximately −45.2. As a quick test to verify the assumption of equal variances (and to confirm in a sense whether the Welch adjustment was necessary), we can use var.test
which will produce a ratio of variances and evaluate the null hypothesis that this ratio is equal to 1 (i.e., if the variances are equal, the numerator of the ratio will be the same as the denominator):
> var.test(grade.0, grade.1) F test to compare two variances data: grade.0 and grade.1 F = 0.1683, num df = 4, denom df = 4, p-value = 0.1126 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.01752408 1.61654325 sample estimates: ratio of variances 0.1683105