Multiblock Data Fusion in Statistics and Machine Learning. Tormod Næs
Чтение книги онлайн.
Читать онлайн книгу Multiblock Data Fusion in Statistics and Machine Learning - Tormod Næs страница 13
Figure 7.14 PCP plots for wine data. The upper two plots are the score andloading plots for the predicted Y, the other three are the projectedinput X-variables from the blocks B, C, and D. Block A is not presentsince it is not needed for prediction. The sizes of the points for the Y scores follow the scale of the ‘overall quality’ (small to large) whilecolour follows the scale of ‘typical’ (blue, through green to yellow).
Figure 7.15 Måge plot showing cross-validated explained variance forall combinations of components from the three blockswith a maximum of 10 components in total. The threecoloured lines indicate pure block models, and the insetis a magnified view around maximum explained variance.
Figure 7.16 Block-wise scores (Tm) with 4+3+3 components for left, mid-dle, and right block, respectively (two first components foreach block shown). Dot sizes show the percentage PUFAin sample (small = 0%, large = 12%), while colour showsthe percentage PUFA in fat (see colour-bar on the left).
Figure 7.17 Block-wise (projected) loadings with 4+3+3 components forleft, middle, and right block, respectively (two first for eachblock shown). Dotted vertical lines indicate transition betweenblocks. Note the larger noise level for components six and nine.
Figure 7.18 Block-wise loadings from restricted SO-PLS model with 4+3+3 components for left, middle, and right block, respectively (two first for each block shown). Dotted vertical lines indicate transition between blocks.
Figure 7.19 Måge plot for restricted SO-PLS showing cross-validatedexplained variance for all combinations of components fromthe three blocks with a maximum of 10 components in total.The three coloured lines indicate pure block models, and theinset is a magnified view around maximum explained variance.
Figure 7.20 CV-ANOVA results based on the cross-validated SO-PLS modelsfitted on the Raman data. The circles represent the average absolutevalues of the difference between measured and predicted response, Dik = |yik − ŷik|, (from cross-validation) obtained as new blocks are incorporated. The four ticks on the x-axis represent the different models from the simplest (intercept, predict using average response value) tothe most complex containing all the three blocks (‘X left’, ‘X middle’and ‘X right’). The vertical lines indicate (random) error regions forthe models obtained. Overlap of lines means no significant difference according to Tukey’s pair-wise test (Studentised range) obtained fromthe CV-ANOVA model. This shows that the ‘X middle’ adds significantlyto predictive ability, while ‘X right’ has a negligible contribution.
Figure 7.21 Loadings from Principal Components of Predictions appliedto the 5+4+0 component solutions of SO-PLS on Raman data.
Figure 7.22 RMSEP for fish data with interactions. The standard SO-PLS procedureis used with the order of blocks described in the text. The three curves correspond to different numbers of components for the interaction part.The symbol * in the original figure (see reference) between the blocks isthe same interaction operator as described by the ∘ above. Source: (Næs et al., 2011b). Reproduced with permission from John Wiley and Sons.
Figure 7.23 Regression coefficients for the interactions for the fish data with 4+2+2 components for blocks X1, X2 and the interaction block X3. Regression coefficients are obtained by back-transforming the components inthe interaction block to original units in a similar way as shown rightafter Algorithm 7.3. The full regression vector for the interactionblock (with 24 terms, see above) is split into four parts according tothe four levels of the two design factors (see description of codingabove). Each of the levels of the design factor has its own line inthe figure. As can be seen, there are only two lines for each designfactor, corresponding to the way the design matrix was handled (see explanation at the beginning of the example). The number onthe x-axis represent wavelengths in the NIR region. Lines close to 0 are factor combinations which do not contribute to interaction.Source: Næs et al. (2011a). Reproduced with permission from Wiley.
Figure 7.24 SO-PLS results using candy and assessor variables (dummyvariables) as X and candy attribute assessments as Y. Component numbers in parentheses indicate how many componentswere extracted in the other block before the current block.
Figure 7.25 Illustration of the idea behind PO-PLS for three input blocks, to beread from left to right. The first step is data compression of each block separately (giving scores T1, T2 and T3) before a GCA is run to obtain common components. Then each block is orthogonalised (both the Xmand Y) with respect to the common components, and PLS regressionis used for each of the blocks separately to obtain block-wise distinctscores. The F in the figure is the orthogonalised Y. The common andblock wise-scores are finally combined in a joint regression model.Note that the different T blocks can have different numbers of columns.
Figure 7.26 PO-PLS calibrated/fitted and validated explained variancewhen applied to three-block Raman with PUFA responses.
Figure 7.27 PO-PLS calibrated explained variance when appliedto three-block Raman with PUFA responses.
Figure 7.28 PO-PLS common scores when applied to three-block Raman withPUFA responses. The plot to the left is for the first component from X1,2,3 versus X1,2 and the one to the right is for first component from X1,2,3 versus X1,3. Size and colour of the points follow the amountof PUFA % in sample and PUFA % in fat, respectively (see alsothe numbers presented in the text for the axes). The percentages reported in the axis labels are calibrated explained variance forthe two responses, corresponding to the numbers in Figure 7.26.
Figure 7.29 PO-PLS common loadings when applied tothree-block Raman with PUFA responses.
Figure 7.30 PO-PLS distinct loadings when applied tothree-block Raman with PUFA responses.
Figure 7.31 ROSA component selection searches among candidate scores (tm) from all blocks for the one that minimises the distance tothe residual response Y. After deflation with the winning score (Ynew = Y − trq′r = Y − trttrY) the process is repeated until a desirednumber of components has been extracted. Zeros in weights are shown in white for an arbitrary selection of blocks, here blocks 2,1,3,1. Loadings, P, and weights, W (see text), span all blocks.
Figure 7.32 Cross-validated explained variance when ROSA is applied tothree-block Raman with PUFA in sample and in fat on theleft and both PUFA responses simultaneously on the right.
Figure 7.33 ROSA weights (five first components) when appliedto three-block Raman with the PUFA sample response.
Figure 7.34 Summary of cross-validated candidate scores from blocks. Top: residual RMSECV (root mean square error of cross-validation)for each candidate component. Bottom: correlation between candidate scores and the score from the block that was selected. White dots show which block was selected for each component.
Figure 7.35 The decision paths for ‘Common and distinct components; (implicitly handled, additional contribution from block or explicitly handled)and ‘Choosing components’