Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen

Чтение книги онлайн.

Читать онлайн книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen страница 15

Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen

Скачать книгу

x with bar on top equals left parenthesis table row cell x with bar on top subscript 1 end cell row cell x with bar on top subscript 2 end cell row vertical ellipsis row cell x with bar on top subscript p end cell end table right parenthesis equals 1 over n sum from i equals 1 to n of x subscript i comma

      where x̄k is the sample mean of x subscript k comma i. e. comma x with bar on top subscript i equals 1 over n sum subscript i equals 1 end subscript superscript n x subscript i k end subscript comma k equals 1 comma horizontal ellipsis comma p.

      The sample covariance matrix S is the matrix of sample variances and covariances of the p variables:

bold S bold equals open parentheses table row cell bold S subscript bold 11 end cell cell bold S subscript bold 12 end cell bold midline horizontal ellipsis cell bold S subscript bold 1 bold p end subscript end cell row cell bold S subscript bold 21 end cell cell bold S subscript bold 22 end cell bold midline horizontal ellipsis cell bold S subscript bold 2 bold p end subscript end cell row bold vertical ellipsis bold vertical ellipsis blank bold vertical ellipsis row cell bold S subscript bold p bold 1 end subscript end cell cell bold S subscript bold p bold 2 end subscript end cell bold midline horizontal ellipsis cell bold S subscript bold p bold p end subscript end cell end table close parentheses

      The off-diagonal elements of S is the sample covariances of each pair of variables. For j ≠ k,

      bold S equals fraction numerator 1 over denominator n minus 1 end fraction sum from i equals 1 to n of left parenthesis bold x subscript i minus bold x with bold bar on top right parenthesis left parenthesis bold x subscript i minus bold x with bold bar on top right parenthesis to the power of T. (2.6)

      Similarly, we define the sample correlation matrix as

bold R equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row 1 cell r subscript 12 end cell horizontal ellipsis cell r subscript 1 p end subscript end cell row cell r subscript 21 end cell 1 horizontal ellipsis cell r subscript 2 p end subscript end cell row vertical ellipsis vertical ellipsis blank vertical ellipsis row cell r subscript n 1 end subscript end cell cell r subscript n 2 end subscript end cell horizontal ellipsis 1 end table close parentheses.

      The (j, k)th element of R is the sample correlation of the jth and kth variables:

r subscript j k end subscript equals fraction numerator s subscript j k end subscript over denominator s subscript j s subscript k end fraction.

      The sample correlation between a variable and itself is equal to 1. So the diagonal elements of a sample correlation matrix are all equal to 1. The sample correlation matrix R is obviously symmetric since rjk = rkj.

      Example 2.4 Consider the data set in Table 2.1. In Example 2.2, we found that 1 = 2479.5 and 2 = 170.35. Similarly, we can obtain 3 = 65.41. So the mean vector of x = (x1 x2 x3)T is given by

bold x with bold bar on top equals left parenthesis 2479.5 text end text 170.35 text end text 65.41 right parenthesis to the power of T.

      In Example 2.2, we calculated the sample variances, sample covariance, and sample correlation of x1 and x2. Similarly, we can obtain the sample variance of x3 and its sample covariance and correlation with the other two variables as

s subscript 3 superscript 2 equals 3.71 comma space s subscript 13 equals 820.8 comma space space s subscript 23 equals 15.56 comma space r subscript 13 equals 0.832 comma space r subscript 23 equals 0.881.

      Note that while s23 is much smaller than s13, r23 is greater than r13, which indicates that the linear association between x2 and x3 is stronger than that of x1 and x3. This clearly shows that the magnitude of the covariance itself is not meaningful in characterizing how strong the relationship of two variables is. Combining all the sample variance, covariance, and correlation information, the sample covariance matrix and sample correlation matrix of x = (x1 x2 x3)T can be written as

bold S equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell 262829.2 end cell cell 4316.8 end cell cell 820.8 end cell row cell 4316.8 end cell cell 84.07 end cell cell 15.56 end cell row cell 820.8 end cell cell 15.56 end cell cell 3.71 end cell end table close parentheses comma space of 1em bold R equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row 1 cell 0.918 end cell cell 0.832 end cell row cell 0.918 end cell 1 cell 0.881 end cell row cell 0.832 end cell cell 0.881 end cell 1 end table close parentheses.

      We are often interested in some linear combinations of the variables x1, x2,…, xp. For example, for the auto_spec data set, two of the variables are city.mpg and highway.mpg. If you expect that 60% of the mileage for a car is on highway and 40% is on local roads, then the average MPG for a car can be estimated as 0.6 × highway.mpg + 0.4 × city.mpg, which is a linear combination of city.mpg and highway.mpg. In general, let c1, c2,…, cp be constants and consider the linear combination of the variables x1, x2,…, xp given by

z equals c subscript 1 x subscript 1 plus c subscript 2 x subscript 2 plus horizontal ellipsis plus c subscript p x subscript p.

      For each observation of the data set, the corresponding value of the variable z

Скачать книгу