Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen

Чтение книги онлайн.

Читать онлайн книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen страница 18

Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen

Скачать книгу

table row cell mu subscript i equals E left parenthesis X subscript i right parenthesis equals left curly bracket table row cell integral from negative infinity to infinity of x subscript i f subscript i left parenthesis x subscript i right parenthesis d x subscript i end cell cell if space X subscript i space is space straight a space continuous space random space variable end cell row cell sum for x subscript i of x subscript i p subscript i left parenthesis x subscript i right parenthesis end cell cell if space X subscript i space is space straight a space discrete space random space variable end cell end table end cell end table comma

      where fi,(xi) is the probability density function of Xi if Xi is continuous and pi(Xi is the probability mass function of Xi if Xi is discrete. The µi is also called the population mean of Xi because it is the mean of Xi over all possible values in the population. Similarly, the mean vector µ is the population mean vector of X.

      To further explain the relationship and difference between the population mean and the sample mean introduced in Section 2.2, we first consider a univariate random variable X and its population mean μ. Consider a random sample of observations from the population, say, X1, X2,…, Xn. The sample mean x with bar on top equals 1 over n sum from i equals 1 to n of X subscript i is a random variable because the observations X1, X2,…, Xn are all random variables with values varying from sample to sample. For example, let X represent the measured intensity of the current of a wafer produced by a semiconductor manufacturing process. Then we take a random sample of n = 10 wafers from this process and compute the sample mean of the measured intensities of the current and get the result = 1.02. Now we repeat this process, taking a second sample of n = 10 wafers from the same process and the resulting sample mean is 1.04. The sample means differ from sample to sample because they are random variables. Consequently, the sample mean, and any other function of the random observations, is a random variable. On the other hand, the population mean µ does not depend on the samples and is a (usually unknown) constant. When we take a sample with very large sample size n, the sample mean will be very close to the population mean µ with high probability. As the sample mean is a random variable, we can evaluate its mean and variance. It is easy to see that E() = µ and var () = σ2/n, where β2 is the variance of X. An estimator of a parameter is called unbiased if its mean is equal to the true value of the parameter. is a commonly used estimator of µ because it is unbiased and has a smaller variance for a larger sample size n.

      This concept can be extended to a p-dimensional random vector X with mean vector µ. Consider a random sample X1, X2,…, Xn from the population of X. The sample mean vector is a random vector with population mean E() = µ and population covariance matrix cov left parenthesis bold X with bar on top right parenthesis equals 1 over n bold capital sigma, where Σ is the population covariance matrix of X. The population covariance matrix is defined shortly. The sample mean vector is an unbiased estimator of the population mean vector μ.

      The (population) covariance matrix of a random vector X is defined as

table row cell bold capital sigma bold equals bold c bold o bold v bold left parenthesis bold X with bold bar on top bold right parenthesis bold equals open parentheses table row cell bold sigma subscript bold 11 end cell cell bold sigma subscript bold 12 end cell bold horizontal ellipsis cell bold sigma subscript bold 1 bold p end subscript end cell row cell bold sigma subscript bold 21 end cell cell bold sigma subscript bold 22 end cell bold horizontal ellipsis cell bold sigma subscript bold 2 bold p end subscript end cell row bold vertical ellipsis bold vertical ellipsis blank bold vertical ellipsis row cell bold sigma subscript bold p bold 1 end subscript end cell cell bold sigma subscript bold p bold 2 end subscript end cell bold horizontal ellipsis cell bold sigma subscript bold p bold p end subscript end cell end table close parentheses bold. end cell end table bold italic sigma subscript bold ii bold equals bold italic sigma subscript bold i superscript bold 2 bold equals open curly brackets table attributes columnalign left end attributes row cell table attributes columnalign left end attributes row cell bold integral subscript bold minus bold infinity end subscript superscript bold infinity bold left parenthesis bold x subscript bold i bold minus bold mu subscript bold i bold right parenthesis to the power of bold 2 bold f subscript bold i bold left parenthesis bold x subscript bold i bold right parenthesis bold dx subscript bold i end cell cell bold if bold space bold X subscript bold i bold space bold is bold thin space bold a bold thin space bold continuous bold thin space bold random bold thin space bold variable end cell row cell stack begin bold style sum end style with bold x subscript bold i below bold left parenthesis bold x subscript bold i bold minus bold mu subscript bold i bold right parenthesis to the power of bold 2 bold p subscript bold i bold left parenthesis bold x subscript bold i bold right parenthesis end cell cell bold if bold space bold X subscript bold i bold space bold is bold thin space bold a bold thin space bold discrete bold thin space bold random bold thin space bold variable end cell end table end cell row blank end table close bold.

      The (j,k)th off-diagonal element of Σ is the population covariance between Xj and Xk:

table attributes columnalign left end attributes row cell bold sigma subscript bold j bold k end subscript bold equals bold E bold left parenthesis bold X subscript bold j bold minus bold mu subscript bold j bold right parenthesis bold left parenthesis bold X subscript bold k bold minus bold mu subscript bold k bold right parenthesis end cell row cell bold equals bold left curly bracket table attributes columnalign left end attributes row cell begin bold style integral subscript negative infinity end subscript superscript infinity integral subscript negative infinity end subscript superscript infinity left parenthesis x subscript j minus mu subscript j right parenthesis left parenthesis x subscript k minus mu subscript k right parenthesis f subscript j k end subscript left parenthesis x subscript j comma x subscript k right parenthesis d subscript x j end subscript d x subscript k text?? end text i f text? end text X subscript j comma end subscript X subscript j text? end text a r e text? end text c o n t i n u o u s text? end text r a n d o m text? end text variables end style end cell row cell begin bold style sum for x j of sum for x k of left parenthesis x subscript j minus mu subscript j right parenthesis left parenthesis x subscript k minus mu subscript k right parenthesis p subscript j k end subscript left parenthesis x subscript j comma x subscript k right parenthesis text????????????????? end text i f text?? end text X subscript j comma end subscript X subscript k text? end text a r e text? end text d i s c r e t e text? end text r a n d o m text? end text variables end style end cell end table bold comma end cell end table

      where fjk(xj, xk) and pjk(xj, xk) are the joint density function and joint probability mass function, respectively, of Xj and Xk. The population covariance measures the linear association between the two random variables. It is clear that σi = σkj and the covariance matrix Σ is symmetric. The same as the sample covariance matrix, the population covariance matrix Σ is always positive semidefinite.

Скачать книгу