Industrial Data Analytics for Diagnosis and Prognosis. Yong Chen
Чтение книги онлайн.
Читать онлайн книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen страница 26
where x̄ is the sample mean of the data, which is the MLE of μ. It is easy to see the similarity between the results for the univariate data in (3.28) and (3.29) and the results for the multivariate data in (3.30) and (3.31). The MAP of μ is exactly μn. Similar to the univariate case, when n is large, or when the prior distribution is flat, the MAP is close to the MLE.
One advantage of the Bayesian inference is that the prior knowledge can be included naturally. Suppose, for example, a randomly sampled product turns out to be defective. A MLE of the defective rate based on this single observation would be equal to 1, implying that all products are defective. By contrast, a Bayesian approach with a reasonable prior should give a much less extreme conclusion. In addition, the Bayesian inference can be performed in a sequential manner very naturally. To see this, we can write the posterior distribution of μ with the contribution from the last data point xn separated out as
Equation (3.32) can be viewed as the posterior distribution given a single observation xn with the term in the square bracket treated as the prior. Note that the term in the square brackets is just the posterior distribution (up to a normalization constant) after observing n − 1 data points. Equation (3.32) says that we can treat the posterior based on the first n − 1 observations as the prior and update the posterior based on the next observation using the Bayes’ theorem. This process can be repeated sequentially for each new observation. The sequential update of posterior under the Bayesian framework is very useful when observations are collected sequentially over time.
Example 3.3: For the side_temp_defect
data set from a hot rolling process, suppose the true covariance matrix of the side temperatures measured at location 2, 40, and 78 of Stand 5 is known and given by
We use the nominal mean temperatures as given in Example 3.2 as the mean of the prior distribution and a diagonal matrix with variance equal to 100 for each temperature variable as its covariance matrix:
Based on (3.30) and (3.31), the following R
codes calculate the posterior mean and covariance matrix for μ using the first five (n = 5) observations in the data set.
Sigma <- matrix(c(2547.4, -111.0, 133.7, -111.0, 533.1, 300.7, 133.7, 300.7, 562.5), nrow = 3, ncol = 3, byrow = T) Precision <- solve(Sigma) Sigma0 <- diag(rep(100, 3)) Precision0 <- solve(Sigma0) mu0 <- c(1926, 1851, 1872) n <- 5 X.n <- side.temp.defect[1:n, c(2, 40, 78)] x.bar <- apply(X.n, 2, mean) mu.n <- solve(Precision0+n*Precision)%*% (Precision0%*%mu0+n*Precision%*%x.bar) Sigma.n <- solve(Precision0 + n*Precision)
The posterior mean and covariance matrix are obtained as
Compared to the sample mean of the first five observations, which is (1943 1850 1838)T, the posterior mean has some deviations from both the sample mean and the prior mean μ0. Now we use the first 100 (n = 100) observations to find the posterior mean by changing n
in the R
codes from 5 to 100. The posterior mean and covariance matrix are
Compared to the sample mean vector of the first 100 observations, which is (1944 1849 1865)T, the posterior mean with n = 100 observations is very close to the sample mean, while the influence of the prior mean is very small. In addition, the posterior variance for the mean temperature at each of the three locations is much smaller for n = 100 than for n = 5.
Bibliographic Notes
Multivariate normal distribution and its inference are thoroughly discussed in multivariate statistics books, for example, Johnson et al. [2002], Rencher [2003], and Anderson [2003]. Particularly, proofs of many theoretical results and properties can be found in Anderson [2003].
Exercises
1 Consider two discrete random variables X and Y with joint probability mass function p(x, y) given in the following table:
x | y | p(x, y) |
---|---|---|
–1 | –1 |
|