Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen
Чтение книги онлайн.
Читать онлайн книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen страница 14
where s1 and s2 are the sample standard deviation of x1 and x2, respectively. The sample correlation ranges between −1 and 1, with values close to 1, −1, and 0 indicating a strong positive linear association, a strong negative linear association, and no linear association, respectively.
Example 2.2 To illustrate the calculation of summary statistics, we take a random sample of 10 observations, as shown in Table 2.1, from the auto.spec
data set on the variables curb.weight
, length
, and width
. We use xi, i =1,2,3, to represent the three variables:
Table 2.1 A random sample of 10 observations from the auto. spec data set.
x1 | x2 | x3 |
3515 | 190.9 | 70.3 |
2300 | 168.7 | 64.0 |
2800 | 168.9 | 65.0 |
2122 | 166.3 | 64.4 |
2293 | 169.1 | 66.0 |
2765 | 176.8 | 64.8 |
2275 | 171.7 | 65.5 |
1890 | 159.1 | 64.2 |
2926 | 173.2 | 66.3 |
1909 | 158.8 | 63.6 |
To obtain the sample covariance for the variables curb.weight
and length
in the data set in Table 2.1, we first calculate the sample means x̄1, x̄2, and
By (2.2), the sample covariance of the two variables can be obtained as
The s12 value of 4316.8 itself cannot tell us whether the two variables have a strong or weak (linear) relationship. Such information can be provided by the correlation. To evaluate the sample correlation, we first need the sample variance of x1 and x2. By (2.1), we have
By (2.4), we have
which is close to 1 and corresponding to a strong positive linear association between the curb weight and length of cars.
Example 2.3 In R
, the sample mean, variance, covariance, and correlation can be found using functions mean()
, var()
, cov()
, and cor()
, respectively. For example, the following R
codes can be used to find the sample mean and sample variance of curb.weight
, and the sample covariance and correlation between curb.weight
and length
, in the auto.spec
data set.
mean(auto.spec.df$curb.weight) var(auto.spec.df$curb.weight) with(auto.spec.df, cov(curb.weight, length)) with(auto.spec.df, cor(curb.weight, length))> mean(auto.spec.df$curb.weight) [1] 2555.566 > var(auto.spec.df$curb.weight) [1] 271107.9 > with(auto.spec.df, cov(curb.weight, length)) [1] 5638.336 > with(auto.spec.df, cor(curb.weight, length)) [1] 0.8777285
Note the results above are somewhat different from those in Example 2.2 because in this example we use the entire data set of auto.spec
, instead of a small random subset of it as in Example 2.2.
2.2.2 Sample Mean Vector and Sample Covariance Matrix
A multivariate data set consists of n observations collected from n items or units and each observation contains measurements on p variables, x1, x2,…, xp. The measurement vector for the ith observation is denoted by
The sample mean vector is the vector of sample means for the p variables, which is defined