Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis
Чтение книги онлайн.
Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 26
![Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis](/cover_pre928329.jpg)
Condition Present (1) | Condition Absent (0) | Total | |
---|---|---|---|
Exposure yes (1) | 20 | 10 | 30 |
Exposure no (2) | 5 | 15 | 20 |
Total | 25 | 25 | 50 |
where Oi and Ei represent observed and expected frequencies, respectively, summed across r rows and c columns.
As a simple example, consider the hypothetical data (Table 2.1), where the frequencies of those exposed to something adverse are related to whether a condition is present or absent. If you are a clinical psychologist, then you might define exposure as, perhaps, a variable such as combat exposure, and condition as posttraumatic stress disorder (if you are not a psychologist, see if you can come up with another example).
The null hypothesis is that the 50 counts making up the entire table are more or less randomly distributed across each of the cells. That is, there is no association between condition and exposure. We can easily test this hypothesis in SPSS by weighting the relevant frequencies by cell total:
exposure | condition | freq |
1.00 | 0.00 | 10.00 |
1.00 | 1.00 | 20.00 |
2.00 | 0.00 | 15.00 |
2.00 | 1.00 | 5.00 |
WEIGHT BY freq. CROSSTABS /TABLES=condition BY exposure /FORMAT=AVALUE TABLES /STATISTICS=CHISQ /CELLS=COUNT /COUNT ROUND CELL.
The output follows in which it is first confirmed that we set up our data file correctly:
Exposure * Condition Crosstabulation | ||||
---|---|---|---|---|
Count | ||||
Condition | Total | |||
1.00 | 0.00 | |||
Exposure | 1.00 | 20 | 10 | 30 |
2.00 | 5 | 15 | 20 | |
Total | 25 | 25 | 50 |
We focus on the Pearson chi‐square test value of 8.3 on a single degree of freedom. It is statistically significant (p = 0.004), and hence we can reject the null hypothesis of no association between condition and exposure group.
Chi‐square Tests | |||||
Value | df | Asymp. Sig. (two‐sided) | Exact Sig. (two‐sided) | Exact Sig. (one‐sided) | |
Pearson chi‐square | 8.333a | 1 | 0.004 | ||
Continuity correctionb | 6.750 | 1 | 0.009 | ||
Likelihood ratio | 8.630 | 1 | 0.003 | ||
Fisher's exact test | 0.009 | 0.004 | |||
Linear‐by‐linear association | 8.167 | 1 | 0.004 | ||
No. of valid cases | 50 |
a 0 cells (0.0%) have expected count less than 5. The minimum expected count is 10.00.
b Computed only for a 2 × 2 table.
In R, we can easily perform the chi‐square test on this data. We first build the matrix of cell counts, calling it diag.table
:
> diag.table <- matrix(c(20, 5, 10, 15), nrow = 2) > diag.table [,1] [,2] [1,] 20 10 [2,] 5 15 > chisq.test(diag.table, correct = F) Pearson's Chi-squared test data: diag.table X-squared = 8.3333, df = 1, p-value = 0.003892
We see that the result in R agrees with what we obtained in SPSS. Note that specifying correct = F
(correction = false) negated what is known as Yates' correction for continuity, which involves subtracting 0.5 from positive differences in O − E and adding 0.5 to negative differences in O − E in an attempt to better make the chi‐square distribution approximate that of a multinomial distribution (i.e., in a crude sense, to help make discrete probabilities more continuous). To adjust for Yates, we can either specify correct = T
or simply chisq.test(diag.table)
, which will incorporate the correction. With the correction implemented,