Discovering Partial Least Squares with JMP. Marie Gaudard A.

Чтение книги онлайн.

Читать онлайн книгу Discovering Partial Least Squares with JMP - Marie Gaudard A. страница 7

Автор:
Жанр:
Серия:
Издательство:
Discovering Partial Least Squares with JMP - Marie Gaudard A.

Скачать книгу

93β2+2.32β3+ε2β0+8β1+175β2+3.44β3+β4+ε3β0+6β1+105β2+3.46β3+β4+ε4β0+8β1+245β2+3.57β3+β4+ε5β0+4β1+ 62β2+3.19β3+β4+ε6)

      Equation (2.2) indicates that each response is to be modeled as a linear function of the unknown βs.

      We can represent Equation (2.1) more generically as:

      (2.3)(Y1Y2Y3Y4Y5Y6)=[ X10X11X12X13X14X20X21X22X23X24X30X31X32X33X34X40X41X42X43X44X50X51X52X53X54X60X61X62X63X64 ]*(β0β1β2β3β4)+(ε1ε2ε3ε4ε5ε6)

      Now we can write Equation (2.3) succinctly as:

      (2.4) Y = Xβ + ε,

      Here

      Y=(Y1Y2Y3Y4Y5Y6),

      X=[ X10X11X12X13X14X20X21X22X23X24X30X31X32X33X34X40X41X42X43X44X50X51X52X53X54X60X61X62X63X64 ],

      β=(β0β1β2β3β4),

      and

      ε=(ε1ε2ε3ε4ε5ε6)

      For a column vector like Y, we need only one index to designate the row in which an element occurs. For the 6 by 5 matrix X, we require two indices. The first designates the row and the second designates the column. Note that we have not specified the matrix multiplication operator in Equation (2.4); it is implied by the juxtaposition of any two matrices.

      Equation (2.4) enables us to note the following:

      1. The entries in X consist of the column of ones followed by the observed data on each of the four predictors.

      2. Even though the entries in X are observational data, rather than the result of a designed experiment, the matrix X is still called the design matrix.

      3. The vector ε, which contains the errors,εi, is often referred to as the noise.

      4. Once we have estimated the column vector β, we are able to obtain predicted values of MPG. By comparing these predicted values to their actual values, we obtain estimates of the errors,εi. These differences, namely the actual minus the predicted values, are called residuals.

      5. If the model provides a good fit, we expect the residuals to be small, in some sense. We also expect them to show a somewhat random pattern, indicating that our model adequately captures the structural relationship between X and Y. If the residuals show a structured pattern, one remedy might be to specify a more complex model by adding additional columns to X; for example, columns that define interaction terms and/or power terms (Draper and Smith 1998).

      6. The values in β are the coefficients or parameters that correspond to each column or term in the design matrix X (including the first, constant term). In terms of this linear model, their interpretation is straightforward. For example, β3 is the expected change in MPG for a unit change in Weight.

      7. Note that the dimensions of the matrices (number of rows and columns) have to conform in Equation (2.4). In our example, Y is a 6 by 1 matrix, X is a 6 by 5 matrix, β is a 5 by 1 matrix, and ε is a 6 by 1 matrix.

      So how do we calculate β from the data we have collected? There are numerous approaches, partly depending on the assumptions you are prepared or required to make about the noise component, ε. It is generally assumed that the X variables are measured without noise, so that the noise is associated only with the measurement of the response, Y.

      It is also generally assumed that the errors,εi, are identically and independently distributed according to a normal distribution (Draper and Smith 1998). Once a model is fit to the data, your next step should be to check if the pattern of residuals is consistent with this assumption.

      For a full introduction to MLR in JMP using the Fit Model platform, select Help > Books > Fitting Linear Models. When the PDF opens, go to the chapter entitled “Standard Least Squares Report and Options.”

      More generally, returning to the point about matrix dimensions, the dimensions of the components of a regression model of the form

      (2.5) Y = Xβ + ε

      can be represented as follows:

      • Y is an n x 1 response matrix.

      • X is an n x m design matrix

      • β is an m x 1 coefficient vector.

      • ε is an n x 1 error vector.

      Here n is the number of observations and m is the number of columns in X. For now, we assume that there is only one column in Y, but later on, we consider situations where Y has multiple columns.

      Let’s pause for a quick linear algebra review. If A is any r x s matrix with elements αij, then the matrix A’, with elements αji, is called the transpose of A. Note that the rows of A are the columns of A’. We denote the inverse of a square q x q matrix B by B-1. By definition, BB-1 = B-1B = 1, where I is the q x q identity matrix (with 1’s down the leading diagonal and 0’s elsewhere). If the columns of an arbitrary matrix, say A, are linearly independent, then it can be shown that the inverse of the matrix A’A exists.

      In MLR, when nm and when the columns of X are linearly independent so that the matrix (X'X)-1 exists, the coefficients in β can be estimated in a unique fashion as:

      β^=(X'X)-1X'Y

      The hat above β in the notation β^ indicates that this vector contains numerical estimates of the unknown coefficients in β. If there are fewer observations than columns in X, n < m, then there are an infinite number of solutions for β in Equation (2.5).

      As an example, think of trying to fit two observations with a matrix X that has three columns. Then, geometrically, the expression in Equation (2.5) defines a hyperplane which, given that m = 3 in this case, is simply a plane. But there are infinitely many planes that pass through any two given points. There is no way to determine which of these infinitely many solutions would be best at predicting new observations well.

      To better understand the issues behind model fitting, let’s run the script PolyRegr.jsl by clicking on the correct link in the master journal.

      The script randomly generates Y

Скачать книгу