Discovering Partial Least Squares with JMP. Marie Gaudard A.

Чтение книги онлайн.

Читать онлайн книгу Discovering Partial Least Squares with JMP - Marie Gaudard A. страница 15

Автор:
Жанр:
Серия:
Издательство:
Discovering Partial Least Squares with JMP - Marie Gaudard A.

Скачать книгу

and X can be given in terms of a regression on the scores, T. Although we won’t go into the details at this point, we introduce notation for these predictive models:

      (4.1) X^=TP'

       Y^=TQ'

      where P is m x a and Q is k x a. The matrix P is called the X loading matrix, and its columns are the scaled X loadings. The matrix Q is sometimes called the Y loading matrix. In NIPALS, its columns are proportional to the Y loading vectors. In SIMPLS, when Y contains more than one response, its representation in terms of loading vectors is more complex. Each matrix projects the observations onto the space defined by the factors. (See Appendix 1.) Each column is associated with a specific factor. For example, the ith column of P is associated with the ith extracted factor. The jth element of the ith column of P reflects the strength of the relationship between the jth predictor and the ith extracted factor. The columns of Q are interpreted similarly.

      To facilitate the task of determining how much a predictor or response variable contributes to a factor, the loadings are usually scaled so that each loading vector has length one. This makes it easy to compare loadings across factors and across the variables in X and Y.

      Model in Terms of Xs

      Let’s continue to assume that the variables in the matrices X and Y are centered and scaled. We can consider the Ys to be related directly to the Xs in terms of a theoretical model as follows:

      Y = Xβ + εY.

      Here, β is an m x k matrix of regression coefficients. The estimate of the matrix β that is derived using PLS depends on the fitting algorithm. The details of the derivation are given in Appendix 1.

      The NIPALS algorithm requires the use of a diagonal matrix, Δb, whose diagonal entries are defined by the inner relation mentioned earlier, where the Y scores, ui, are regressed on the X scores, ti. The estimate of β also involves a matrix, C, that contains the Y weights, also called the Y loadings. The column vectors of C define linear combinations of the deflated Y variables that have maximum covariance with linear combinations of the deflated X variables.

      Using these matrices, in NIPALS, β is estimated by

      (4.2) B = W(P'W)-1ΔbC'

      and Y is estimated in terms of X by

      Y^=XB=XW(P'W)−1ΔbC'.

      The SIMPLS algorithm also requires a matrix of Y weights, also called Y loadings, that is computed in a different fashion than in NIPALS. Nevertheless, we call this matrix C. Then, for SIMPLS, β is estimated by

      (4.3) B = WC'

      and Y is estimated in terms of X by

      Y^=XB=XWC'.

      Properties

      Perhaps the most important property, shared by both NIPALS and SIMPLS, is that, subject to norm restrictions, both methods maximize the covariance between the X structure and the Y structure for each factor. The precise sense in which this property holds is one of the features that distinguishes NIPALS and SIMPLS. In NIPALS, the covariance is maximized for components defined on the residual matrices. In contrast, the maximization in SIMPLS applies directly to the centered and scaled X and Y matrices.

      The scores, which form the basis for PLS modeling, are constructed from the weights. The weights are the vectors that define linear combinations of the Xs that maximize covariance with the Ys. Maximizing the covariance is directly related to maximizing the correlation. One can show that maximizing the covariance is equivalent to maximizing the product of the squared correlation between the X and Y structures, and the variance of the X structure. (See the section “Bias toward X Directions with High Variance” in Appendix 1, or Hastie et al. 2001.)

      Recalling that correlation is a scale-invariant measure of linear relationship, this insight shows that the PLS model is pulled toward directions in X space that have high variability. In other words, the PLS model is biased away from directions in the X space with low variability. (This is illustrated in the section “PLS versus PCA”.) As the number of latent factors increases, the PLS model approaches the standard least squares model.

      The vector of X scores, ti, represents the location of the rows of X projected onto the ith factor, wi. The entries of the X loading vector at the ith iteration are proportional to the correlations of the centered and scaled predictors with ti. So, the term loading refers to how the predictors relate to a given factor in terms of degree of correlation. Similarly, the entries of the Y loading vector at the ith iteration are proportional to the correlations of the centered and scaled responses with ti. JMP scales all loading vectors to have length one. Note that Y loadings are not of interest unless there are multiple responses. (See “Properties of the NIPALS Algorithm” in Appendix 1.)

      It is also worth pointing out that the factors that define the linear surface onto which the X values are projected are orthogonal to each other. This has these advantages:

      • Because they relate to independent directions, the scores are easy to interpret.

      • If we were to fit two models, say, one with only one extracted factor and one with two, the single factor in the first model would be identical to the first factor in the second model. That is, as we add more factors to a PLS model, we do not disturb the ones we already have. This is a useful feature that it is not shared by all projection-based methods; independent component analysis (Hastie et al. 2001) is an example of a technique that does not have this feature.

      We detail properties associated with both fitting algorithms in Appendix 1.

      Example

      Now, to gain a deeper understanding of two of the basic elements in PLS, the scores and loadings, open the data table PLSScoresAndLoadings.jmp by clicking on the correct link in the master journal. This table contains two predictors, x1 and x2, and two responses, y1, and y2, as well as other columns that have been saved, as we shall see, as the result of a PLS analysis. The table also contains six scripts, which we run in order.

      Run the first script, Scatterplot Matrix, to explore the relationships among the two predictors, x1 and x2, and the two responses, y1, and y2 (Figure 4.12). The scatterplot in the upper left shows that the predictors are strongly correlated, whereas the scatterplot in the lower right shows that the responses are not very highly correlated. (See the yellow cells in Figure 4.12.)

      Figure 4.12: Scatterplots for All Four Variables

Figure 4.12: Scatterplots for All Four Variables

      The ranges of values suggest that the variables

Скачать книгу