Discovering Partial Least Squares with JMP. Marie Gaudard A.

Чтение книги онлайн.

Читать онлайн книгу Discovering Partial Least Squares with JMP - Marie Gaudard A. страница 14

Автор:
Жанр:
Серия:
Издательство:
Discovering Partial Least Squares with JMP - Marie Gaudard A.

Скачать книгу

versus PCA

      As described earlier, PCA uses the correlation matrix for all variables of interest, whereas PLS uses the submatrix that links responses and predictors. In a situation where there are both Ys and Xs, Figure 4.1 indicates that PCA uses the orange-colored correlations, whereas PLS uses the green-colored correlations. These green entries are the correlations that link the responses and predictors. PLS attempts to identify factors that simultaneously reduce dimensionality and provide predictive power.

      To see a geometric representation that contrasts PLS and PCA, run the script PLS_PCA.jsl by clicking on the correct link in the master journal. This script simulates values for two predictors, X1 and X2, and a single response Y. A report generated by this script is shown in Figure 4.11.

      Figure 4.11: Plots Contrasting PCA and PLS

Figure 4.11: Plots Contrasting PCA and PLS

      The Contour Plot for Y on the left shows how the true value of Y changes with X1 and X2. The continuous color intensity scale shows large values of Y in red and small values in blue, as indicated by the legend to the right of the plot. The contour plot indicates that the response surface is a plane tilted so that it slopes upward in the upper left of the X1, X2 plane and downward in the lower right of the X1, X2 plane. Specifically, the relationship is given by Y = –X1 + .75X2.

      The next two plots, Principal Components and PLS Weights, are obtained using simulated values for X1 and X2. But Y is computed directly using the relationship shown in the Contour Plot.

      The Principal Components plot shows the two principal components. The direction of the first component, PC1, captures as much variation as possible in the values of X1 and X2 regardless of the value of Y. In fact, PC1 is essentially perpendicular to the direction of increase in Y, as shown in the contour plot. PC1 ignores any variation in Y. The second component, PC2, captures residual variation, again ignoring variation in Y.

      The PLS Weights plot shows the directions of the two PLS factors, or latent variables. Note that PLS1 is rotated relative to PC1. PLS1 attempts to explain variation in X1 and X2 while also explaining some of the variation in Y. You can see that, while PC1 is oriented in a direction that gives no information about Y, PLS1 is rotated slightly toward the direction of increase (or decrease) for Y.

      This simulation illustrates the fact that PLS tries to balance the requirements of dimensionality reduction in the X space with the need to explain variation in the response. You can close the report produced by the script PLS_PCA.jsl now.

      Extracting Factors

      Before considering some examples that illustrate more of the basic PLS concepts, let’s introduce some of the technical background that underpins PLS. As you know by now, a main goal of PLS is to predict one or more responses from a collection of predictors. This is done by extracting linear combinations of the predictors that are variously referred to as latent variables, components, or factors. We use the term factor exclusively from now on to be consistent with JMP usage.

      We assume that all variables are at least centered. Also keep in mind that there are various versions of PLS algorithms. We mentioned earlier that JMP provides two approaches: NIPALS and SIMPLS. The following discussion describes PLS in general terms, but to be completely precise, one needs to refer to the specific algorithm in use.

      With this caveat, let’s consider the calculations associated with the first PLS factor. Suppose that X is an n x m matrix whose columns are the m predictors and that Y is an n x k matrix whose columns are the k responses. The first PLS factor is an m x 1 weight vector, w1, whose elements reflect the covariance between the predictors in X and the responses in Y. The jth entry of w1 is the weight associated with the jth predictor. The vector w1 defines a linear combination of the variables in X that, subject to norm restrictions, maximizes covariance relative to all linear combinations of variables in Y. This vector defines the first PLS factor.

      The weight vector w1 is used to weight the observations in X. The n weighted linear combinations of the entries in the columns of X are called X scores, denoted by the vector t1. In other words, the X scores are the entries of the vector t1 = Xw1. Note that the score vector, t1, is n x 1; each observation is given an X score on the first factor. Think of the vector w1 as defining a linear transformation mapping the m predictors to a one-dimensional subspace. With this interpretation, Xw1 represents the mapping of the data to this one-dimensional subspace.

      Technically, t1 is a linear combination of the variables in X that has maximum covariance with a linear combination of the variables in Y, subject to normalizing constraints. That is, there is a vector c1 with the property that the covariance between t1 = Xw1 and u1 = Yc1 is a maximum. The vector c1 is a Y weight vector, also called a loading vector. The elements of the vector u1 are the Y scores. So, for the first factor, we would expect the X scores and the Y scores to be strongly correlated.

      To obtain subsequent factors, we use all factors available to that point to predict both X and Y. In the NIPALS algorithm, the process of obtaining a new weight vector and defining new X scores is applied to the residuals from the predictive models for X and Y. (We say that X and Y are deflated and the process itself is called deflation.) This ensures that subsequent factors are independent of (orthogonal to) all previously extracted factors. In the SIMPLS algorithm, the deflation process is applied to the cross-product matrix. (For complete information, see Appendix 1.)

      Models in Terms of X Scores

      Suppose that a factors are extracted. Then there are:

      • a weight vectors, w1, w2,...,wa

      • a X-score vectors, t1, t2,...,ta

      • a Y-score vectors, u1, u2,...,ua

      We can now define three matrices: W is the m x a matrix whose columns consist of the weight vectors; T and U are the n x a matrices whose columns consist of the X-score and Y-score vectors, respectively. In NIPALS, the Y scores, ui, are regressed on the X scores, ti, in an inner relation regression fit.

      Recall that the matrix Y contains k responses, so that Y is n x k. Let’s also assume that X and Y are both centered and scaled. For both NIPALS and SIMPLS, predictive

Скачать книгу