Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis
Чтение книги онлайн.
Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 47
Now, here is the trick to understanding advanced modeling, including a primary feature of mixed effects modeling. We know that we expect the covariance between treatments to be unequal to 0. This is analogous to what we expected in the simple matched-pairs design. It seems then that a reasonable assumption to make for the data in Table 2.9 is that the covariances between treatments are equal, or at minimum, follow some hypothesized correlational structure. In multilevel and hierarchical models, attempts are made to account for the correlation between treatment levels instead of assuming these correlations to equal 0 as is the case for classical between‐subjects designs. In Chapter 6, we elaborate on these ideas when we discuss randomized block and repeated measures models.
2.25 COMPOSITE VARIABLES: LINEAR COMBINATIONS
In many statistical techniques, especially multivariate ones, statistical analyses take place not on individual variables, but rather on linear combinations of variables. A linear combination in linear algebra can be denoted simply as:
where a ' = (a1, a2, …, ap). These values are scalars, and serve to weight the respective values of y1 through yp, which are the variables.
Just as we did for “ordinary” variables, we can compute a number of central tendency and dispersion statistics on linear combinations. For instance, we can compute the mean of a linear combination ℓi as
We can also compute the sample variance of a linear combination:
for ℓi = a′y, i = 1, 2, …, n, and where S is the sample covariance matrix. Though the form a′Sa for the variance may be difficult to decipher at this point, it will become clearer when we consider techniques such as principal components later in the book.
For two linear combinations,
and
we can obtain the sample covariance between such linear combinations as follows:
The correlation of these linear combinations (Rencher and Christensen, 2012, p. 76) is simply the standardized version of
As we will see later in the book, if
If we can assume multivariate normality of a distribution, that is, Y ∼ N[μ, ∑], then we know linear combinations of Y are also normally distributed, as well as a host of other useful statistical properties (see Timm, 2002, pp. 86–88). In multivariate methods especially, we regularly need to make assumptions about such linear combinations, and it helps to know that so long as we can assume multivariate normality, we have some idea of how such linear combinations will be distributed.
2.26 MODELS IN MATRIX FORM
Throughout the book, our general approach is to first present models in their simplest possible form using only scalars. We then gently introduce the reader to the corresponding matrix counterparts and extensions. The requirement of matrices for such models is to accommodate numerous variables and dimensions. Matrix algebra is the vehicle by which multivariate analysis is communicated, though most of the concepts of statistics can be communicated using simpler scalar algebra. Knowing matrix algebra for its own sake will not necessarily equate to understanding statistical concepts. Indeed, hiding behind the mathematics of statistics are the philosophically “sticky” issues that mathematics or statistics cannot, on their own at least, claim to solve. These are often the problems confronted by researchers and scientists in their empirical pursuits and attempts to draw conclusions from data. For instance, what is the nature of a “correct” model? Do latent variables exist, or are they only a consequence of generating linear combinations? The nature of a latent variable is not necessarily contingent on the linear algebra that seeks to define it. Such questions are largely philosophical, and if such interest you, you are strongly encouraged to familiarize yourself with the philosophy of statistics and mathematics (you may not always find answers to your questions, but you will appreciate the complexity of such questions, as they are beyond our current study here). For a gentle introduction to the philosophy of statistics, see Lindley (2001).
As an example of how matrices will be used to develop more complete and general models, consider the multivariate general linear model in matrix form:
where Y is an n x m matrix of n observations on m response variables, X is the model or “design” matrix whose columns contain k regressors which includes the intercept term, B is a matrix of regression coefficients, and E is a matrix of errors. Many statistical models can be incorporated into the framework of (2.7). As a relatively easy application of this general model, consider the simple linear regression model (featured in Chapter 7) in matrix form:
where yi = 1 to yi = n are observed measurements on some dependent variable, X is the model matrix containing a constant of 1 in the first column to represent the common intercept term (i.e., “common” implying there is one intercept that represents all observations in our data), xi = 1 to xi = n are observed values on a predictor