Читать онлайн книгу - Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen

Скачать книгу

x with bar on top equals left parenthesis table row cell x with bar on top subscript 1 end cell row cell x with bar on top subscript 2 end cell row vertical ellipsis row cell x with bar on top subscript p end cell end table right parenthesis equals 1 over n sum from i equals 1 to n of x subscript i comma

where x̄k is the sample mean of

The sample covariance matrix S is the matrix of sample variances and covariances of the p variables:

bold S bold equals open parentheses table row cell bold S subscript bold 11 end cell cell bold S subscript bold 12 end cell bold midline horizontal ellipsis cell bold S subscript bold 1 bold p end subscript end cell row cell bold S subscript bold 21 end cell cell bold S subscript bold 22 end cell bold midline horizontal ellipsis cell bold S subscript bold 2 bold p end subscript end cell row bold vertical ellipsis bold vertical ellipsis blank bold vertical ellipsis row cell bold S subscript bold p bold 1 end subscript end cell cell bold S subscript bold p bold 2 end subscript end cell bold midline horizontal ellipsis cell bold S subscript bold p bold p end subscript end cell end table close parentheses

The off-diagonal elements of S is the sample covariances of each pair of variables. For j ≠ k,

$straight s subscript jk equals fraction numerator sum from straight i equals 1 to straight n of left parenthesis straight x subscript ij minus straight x with bar on top subscript straight j right parenthesis left parenthesis straight x subscript ik minus straight x subscript straight k right parenthesis over denominator straight n minus 1 end fraction.$ (2.5)

The diagonal elements of S, sjj, j = 1,…,p are the sample variance of the jth variable. It is easy to see that when k = j, the sample covariance in (2.5) is equal to sj², the sample variance of the jth variable. So both notations sjj and sj² represent the sample variance of xj. It is also obvious from (2.5) that skj. So the sample covariance matrix S is a symmetric matrix. The sample covariance matrix S can also be written by the observation vector xi as

$bold S equals fraction numerator 1 over denominator n minus 1 end fraction sum from i equals 1 to n of left parenthesis bold x subscript i minus bold x with bold bar on top right parenthesis left parenthesis bold x subscript i minus bold x with bold bar on top right parenthesis to the power of T.$ (2.6)

Similarly, we define the sample correlation matrix as

bold R equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row 1 cell r subscript 12 end cell horizontal ellipsis cell r subscript 1 p end subscript end cell row cell r subscript 21 end cell 1 horizontal ellipsis cell r subscript 2 p end subscript end cell row vertical ellipsis vertical ellipsis blank vertical ellipsis row cell r subscript n 1 end subscript end cell cell r subscript n 2 end subscript end cell horizontal ellipsis 1 end table close parentheses.

The (j, k)th element of R is the sample correlation of the jth and kth variables:

$r subscript j k end subscript equals fraction numerator s subscript j k end subscript over denominator s subscript j s subscript k end fraction.$

The sample correlation between a variable and itself is equal to 1. So the diagonal elements of a sample correlation matrix are all equal to 1. The sample correlation matrix R is obviously symmetric since rjk = rkj.

Example 2.4 Consider the data set in Table 2.1. In Example 2.2, we found that x̄₁ = 2479.5 and x̄₂ = 170.35. Similarly, we can obtain x̄₃ = 65.41. So the mean vector of x = (x₁ x₂ x₃)T is given by

bold x with bold bar on top equals left parenthesis 2479.5 text end text 170.35 text end text 65.41 right parenthesis to the power of T.

In Example 2.2, we calculated the sample variances, sample covariance, and sample correlation of x₁ and x₂. Similarly, we can obtain the sample variance of x₃ and its sample covariance and correlation with the other two variables as

s subscript 3 superscript 2 equals 3.71 comma space s subscript 13 equals 820.8 comma space space s subscript 23 equals 15.56 comma space r subscript 13 equals 0.832 comma space r subscript 23 equals 0.881.

Note that while s₂₃ is much smaller than s₁₃, r₂₃ is greater than r₁₃, which indicates that the linear association between x₂ and x₃ is stronger than that of x₁ and x₃. This clearly shows that the magnitude of the covariance itself is not meaningful in characterizing how strong the relationship of two variables is. Combining all the sample variance, covariance, and correlation information, the sample covariance matrix and sample correlation matrix of x = (x₁ x₂ x₃)T can be written as

bold S equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell 262829.2 end cell cell 4316.8 end cell cell 820.8 end cell row cell 4316.8 end cell cell 84.07 end cell cell 15.56 end cell row cell 820.8 end cell cell 15.56 end cell cell 3.71 end cell end table close parentheses comma space of 1em bold R equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row 1 cell 0.918 end cell cell 0.832 end cell row cell 0.918 end cell 1 cell 0.881 end cell row cell 0.832 end cell cell 0.881 end cell 1 end table close parentheses.

2.2.3 Linear Combination of Variables

We are often interested in some linear combinations of the variables x₁, x₂,…, xp. For example, for the auto_spec data set, two of the variables are city.mpg and highway.mpg. If you expect that 60% of the mileage for a car is on highway and 40% is on local roads, then the average MPG for a car can be estimated as 0.6 × highway.mpg + 0.4 × city.mpg, which is a linear combination of city.mpg and highway.mpg. In general, let c₁, c₂,…, cp be constants and consider the linear combination of the variables x₁, x₂,…, xp given by

z equals c subscript 1 x subscript 1 plus c subscript 2 x subscript 2 plus horizontal ellipsis plus c subscript p x subscript p.

For each observation of the data set, the corresponding value of the variable z

Скачать книгу

Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen

Чтение книги онлайн.

Читать онлайн книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen страница 15

Информация о книге:

2.2.3 Linear Combination of Variables