Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen
Чтение книги онлайн.
Читать онлайн книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen страница 15
where x̄k is the sample mean of
The sample covariance matrix S is the matrix of sample variances and covariances of the p variables:
The off-diagonal elements of S is the sample covariances of each pair of variables. For j ≠ k,
The diagonal elements of S, sjj, j = 1,…,p are the sample variance of the jth variable. It is easy to see that when k = j, the sample covariance in (2.5) is equal to sj2, the sample variance of the jth variable. So both notations sjj and sj2 represent the sample variance of xj. It is also obvious from (2.5) that skj. So the sample covariance matrix S is a symmetric matrix. The sample covariance matrix S can also be written by the observation vector xi as
Similarly, we define the sample correlation matrix as
The (j, k)th element of R is the sample correlation of the jth and kth variables:
The sample correlation between a variable and itself is equal to 1. So the diagonal elements of a sample correlation matrix are all equal to 1. The sample correlation matrix R is obviously symmetric since rjk = rkj.
Example 2.4 Consider the data set in Table 2.1. In Example 2.2, we found that x̄1 = 2479.5 and x̄2 = 170.35. Similarly, we can obtain x̄3 = 65.41. So the mean vector of x = (x1 x2 x3)T is given by
In Example 2.2, we calculated the sample variances, sample covariance, and sample correlation of x1 and x2. Similarly, we can obtain the sample variance of x3 and its sample covariance and correlation with the other two variables as
Note that while s23 is much smaller than s13, r23 is greater than r13, which indicates that the linear association between x2 and x3 is stronger than that of x1 and x3. This clearly shows that the magnitude of the covariance itself is not meaningful in characterizing how strong the relationship of two variables is. Combining all the sample variance, covariance, and correlation information, the sample covariance matrix and sample correlation matrix of x = (x1 x2 x3)T can be written as
2.2.3 Linear Combination of Variables
We are often interested in some linear combinations of the variables x1, x2,…, xp. For example, for the auto_spec
data set, two of the variables are city.mpg
and highway.mpg
. If you expect that 60% of the mileage for a car is on highway and 40% is on local roads, then the average MPG for a car can be estimated as 0.6 × highway.mpg + 0.4 × city.mpg, which is a linear combination of city.mpg
and highway.mpg
. In general, let c1, c2,…, cp be constants and consider the linear combination of the variables x1, x2,…, xp given by
For each observation of the data set, the corresponding value of the variable z