Читать онлайн книгу - Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen

Скачать книгу

sum subscript i equals 1 end subscript superscript n left parenthesis x subscript i 2 end subscript minus x with bar on top subscript 2 right parenthesis squared to the power of text end text end exponent end style end root end fraction comma"/> (2.4)

where s₁ and s₂ are the sample standard deviation of x₁ and x₂, respectively. The sample correlation ranges between −1 and 1, with values close to 1, −1, and 0 indicating a strong positive linear association, a strong negative linear association, and no linear association, respectively.

Example 2.2 To illustrate the calculation of summary statistics, we take a random sample of 10 observations, as shown in Table 2.1, from the auto.spec data set on the variables curb.weight, length, and width. We use x_i, i =1,2,3, to represent the three variables:

Table 2.1 A random sample of 10 observations from the auto. spec data set.

x₁	x₂	x₃
3515	190.9	70.3
2300	168.7	64.0
2800	168.9	65.0
2122	166.3	64.4
2293	169.1	66.0
2765	176.8	64.8
2275	171.7	65.5
1890	159.1	64.2
2926	173.2	66.3
1909	158.8	63.6

table attributes columnalign left end attributes row cell x subscript 1 equals text curb.weight end text end cell row cell x subscript 2 equals text length end text end cell row cell x subscript 3 equals text width end text end cell end table

To obtain the sample covariance for the variables curb.weight and length in the data set in Table 2.1, we first calculate the sample means x̄₁, x̄₂, and as:

$s squared equals fraction numerator begin display style sum subscript i equals 1 end subscript superscript n left parenthesis x subscript i minus x with bar on top right parenthesis squared end style over denominator n minus 1 end fraction equals fraction numerator begin display style sum subscript i equals 1 end subscript superscript n x subscript i superscript 2 minus n x with bar on top squared end style over denominator n minus 1 end fraction.$

sum from i equals 1 to n of x subscript i 1 end subscript x subscript x 2 end subscript equals left parenthesis 3515 right parenthesis left parenthesis 190.9 right parenthesis plus left parenthesis 2300 right parenthesis left parenthesis 168.7 right parenthesis plus midline horizontal ellipsis plus left parenthesis 1909 right parenthesis left parenthesis 158.8 right parenthesis equals 4262679.

By (2.2), the sample covariance of the two variables can be obtained as

$table attributes columnalign left end attributes row cell s subscript 12 equals fraction numerator begin display style sum subscript i equals 1 end subscript superscript n x subscript i 1 end subscript x subscript i 2 end subscript minus n x with bar on top subscript 1 x with bar on top subscript 2 end style over denominator n minus 1 end fraction end cell row cell equals fraction numerator 4262679 minus left parenthesis 10 right parenthesis left parenthesis 2479.5 right parenthesis left parenthesis 170.35 right parenthesis over denominator 9 end fraction equals 4316.8. end cell end table$

The s₁₂ value of 4316.8 itself cannot tell us whether the two variables have a strong or weak (linear) relationship. Such information can be provided by the correlation. To evaluate the sample correlation, we first need the sample variance of x₁ and x₂. By (2.1), we have

$table attributes columnalign left end attributes row cell s subscript 1 superscript 2 equals fraction numerator begin display style sum from i equals 1 to n of x subscript i 1 end subscript superscript 2 minus n x with bar on top subscript 1 superscript 2 end style over denominator n minus 1 end fraction equals fraction numerator 63 844 665 minus left parenthesis 10 right parenthesis left parenthesis 2479.5 right parenthesis squared over denominator 9 end fraction equals 262 829.2 comma end cell row cell s subscript 2 superscript 2 equals fraction numerator begin display style sum from i equals 1 to n of x subscript i 2 end subscript superscript 2 minus n x with bar on top subscript 2 superscript 2 end style over denominator n minus 1 end fraction equals fraction numerator 290 947.8 minus left parenthesis 10 right parenthesis left parenthesis 170.35 right parenthesis squared over denominator 9 end fraction equals 84.07. end cell end table$

By (2.4), we have

$r subscript 12 equals fraction numerator begin display style s subscript 12 end style over denominator begin display style s subscript 1 s subscript 2 end style end fraction equals fraction numerator begin display style 4316.8 end style over denominator begin display style square root of 262829.2 end root square root of 84.07 end root end style end fraction equals 0.918 comma$

which is close to 1 and corresponding to a strong positive linear association between the curb weight and length of cars.

Example 2.3 In R, the sample mean, variance, covariance, and correlation can be found using functions mean(), var(), cov(), and cor(), respectively. For example, the following R codes can be used to find the sample mean and sample variance of curb.weight, and the sample covariance and correlation between curb.weight and length, in the auto.spec data set.

mean(auto.spec.df$curb.weight) var(auto.spec.df$curb.weight) with(auto.spec.df, cov(curb.weight, length)) with(auto.spec.df, cor(curb.weight, length))> mean(auto.spec.df$curb.weight) [1] 2555.566 > var(auto.spec.df$curb.weight) [1] 271107.9 > with(auto.spec.df, cov(curb.weight, length)) [1] 5638.336 > with(auto.spec.df, cor(curb.weight, length)) [1] 0.8777285

Note the results above are somewhat different from those in Example 2.2 because in this example we use the entire data set of auto.spec, instead of a small random subset of it as in Example 2.2.

2.2.2 Sample Mean Vector and Sample Covariance Matrix

A multivariate data set consists of n observations collected from n items or units and each observation contains measurements on p variables, x₁, x₂,…, xp. The measurement vector for the ith observation is denoted by

x subscript i equals open parentheses table attributes columnspacing 1em rowspacing 4 pt end attributes row cell x subscript i 1 end subscript end cell row cell x subscript i 2 end subscript end cell row vertical ellipsis row cell x subscript i p end subscript end cell end table close parentheses.

The sample mean vector is the vector of sample means for the p variables, which is defined

Скачать книгу

Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen

Чтение книги онлайн.

Читать онлайн книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen страница 14

Информация о книге:

2.2.2 Sample Mean Vector and Sample Covariance Matrix