Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis
Чтение книги онлайн.
Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 36
![Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis](/cover_pre928329.jpg)
2.15 AKAIKE'S INFORMATION CRITERIA
A measure of model fit commonly used in comparing models that uses the log‐likelihood is Akaike's information criteria, or AIC (Sakamoto, Ishiguro, and Kitagawa, 1986). This is one statistic of the kind generally referred to as penalized likelihood statistics (another is the Bayesian information criterion, or BIC). AIC is defined as:
where Lm is the maximized log‐likelihood and m is the number of parameters in the given model. Lower values of AIC indicate a better‐fitting model than do larger values. Recall that the more parameters fit to a model, in general, the better will be the fit of that model. For example, a model that has a unique parameter for each data point would fit perfectly. This is the so‐called saturated model. AIC jointly considers both the goodness of fit as well as the number of parameters required to obtain the given fit, essentially “penalizing” for increasing the number of parameters unless they contribute to model fit. Adding one or more parameters to a model may cause −2Lm to decrease (which is a good thing substantively), but if the parameters are not worthwhile, this will be offset by an increase in 2m.
The Bayesian information criterion, or BIC (Schwarz, 1978) is defined as −2Lm + m log(N), where m, as before, is the number of parameters in the model and N the total number of observations used to fit the model. Lower values of BIC are also desirable when comparing models. BIC typically penalizes model complexity more heavily than AIC. For a comparison of AIC and BIC, see Burnham and Anderson (2011).
2.16 COVARIANCE AND CORRELATION
The covariance of a random variable is given by:
where E[(xi − μx)(yi − μy)] is equal to E(xiyi) − μxμy since
The concept of covariance is at the heart of virtually all statistical methods. Whether one is running analysis of variance, regression, principal component analysis, etc. covariance concepts are central to all of these methodologies and even more broadly to science in general.
The sample covariance is a measure of relationship between two variables and is defined as:
The numerator of the covariance,
The covariance of (2.5) is a perfectly reasonable one to calculate for a sample if there is no intention of using that covariance as an estimator of the population covariance. However, if one wishes to use it as an unbiased estimator, similar to how we needed to subtract 1 from the denominator of the variance, we lose 1 degree of freedom when computing the covariance:
It is easy to understand more of what the covariance actually measures if we consider the trivial case of computing the covariance of a variable with itself. In such a case for variable xi, we would have
But what is this covariance? If we rewrite the numerator as
We compute the covariance between parent height and child height in Galton's data:
> attach(Galton) > cov(parent, child) [1] 2.064614
We have mentioned that the covariance is a measure of linear relationship. However, sample covariances from data set to data set are not comparable unless one knows more of what went into each specific computation. There are actually three things that can be said to be the “ingredients” of the covariance. The first thing it contains is a measure of the cross‐product, which represents the degree to which variables are linearly related. This is the part in our computation of the covariance that we are especially interested in. However, other than concluding a negative, zero, or positive relationship, the size of the covariance does not by itself tell us the degree to which two variables are linearly related.
The reason for this is that the size of covariance will also be impacted by the degree to which there is variability in xi and the degree to which there is variability in yi. If either or both variables contain sizeable deviations of the sort