Multiblock Data Fusion in Statistics and Machine Learning. Tormod Næs
Чтение книги онлайн.
Читать онлайн книгу Multiblock Data Fusion in Statistics and Machine Learning - Tormod Næs страница 12
Figure 6.2 A part of the ASCA decomposition. Similarto Figure 6.1 but now for 11 metabolites.
Figure 6.3 The ASCA scores on the factor light in the plant example (panel (a); expressed in terms of increasing amount of light) and the corresponding loading for the first ASCA component (panel (b)).
Figure 6.4 The ASCA scores on the factor time in the plant example (panel (a)) andthe corresponding loading for the first ASCA component (panel (b)).
Figure 6.5 The ASCA scores on the interaction between light and timein the plant example (panel (a)) and the correspondingloading for the first ASCA component (panel (b)).
Figure 6.6 PCA on toxicology data. Source: Jansen et al. (2008).Reproduced with permission of John Wiley and Sons. 174 1 We thank Frans van der Kloet for making these figures.
Figure 6.7 ASCA on toxicology data. Component 1: left;component 2: right. Source: Jansen et al. (2008).Reproduced with permission of John Wiley and Sons.
Figure 6.8 PARAFASCA on toxicology data. Component 1: left; component 2: right. The vertical dashed lines indicate the boundary betweenthe early and late stages of the experiment. Source: Jansen et al. (2008). Reproduced with permission of John Wiley and Sons.
Figure 6.9 Permutation example. Panel (a): null-distribution for the first case withan effect (with size indicated with red vertical line). Panel (b): the dataof the case with an effect. Panel (c): the null-distribution of the casewithout an effect and the size (red vertical line). Panel (d): the data of thecase with no effect. Source: Vis et al. (2007). Licensed under CC BY 2.0.
Figure 6.10 Permutation test for the factor light (panel (a)) and inter-action between light and time (panel (b)). Legend: blue isthe null-distribution and effect size is indicated by a redvertical arrow. SSQ is the abbreviation of sum-of-squares.
Figure 6.11 ASCA candy scores from candy experiment. The plot to theleft is based on the ellipses from the residual approach inFriendly et al. (2013). The plot to the right is based on themethod suggested in Liland et al. (2018). Source: Liland et al. (2018). Reproduced with permission of John Wiley and Sons.
Figure 6.12 ASCA assessor scores from candy experiment. The plot tothe left is based on the ellipses from the residual approachin Friendly et al. (2013). The plot to the right is based on themethod suggested in Liland et al. (2018). Source: Liland et al. (2018). Reproduced with permission of John Wiley and Sons.
Figure 6.13 ASCA assessor and candy loadings from the candy experiment. Source:Liland et al. (2018). Reproduced with permission of John Wiley and Sons.
Figure 6.14 PE-ASCA of the NMR metabolomics of pig brains. Stars inthe score plots are the factor estimates and circles are theback-projected individual measurements (Zwanenburg et al., 2011). Source: Alinaghi et al. (2020). Licensed under CC BY 4.0.
Figure 6.15 Tree for selecting an ASCA-based method. For abbrevi-ations, see the legend of Table 6.1; BAL=Balanced data,UNB=Unbalanced data. For more explanation, see text.
Figure 7.1 Conceptual illustration of the handling of common and distinctpredictive information for three of the methods covered. The upperfigure illustrates that the two input blocks share some information (C1 and C2), but also have substantial distinct components andnoise (see Chapter 2), here contained in the X (as the darker blueand darker yellow). The lower three figures show how differentmethods handle the common information. For MB-PLS, no initial separation is attempted since the data blocks are concatenated before analysis starts. For SO-PLS, the common predictive informationis handled as part of the X1 block before the distinct part ofthe X2 block is modelled. The extra predictive information in X2 corresponds to the additional variability as will be discussed in the SO-PLS section. For PO-PLS, the common informationis explicitly separated from the distinct parts before regression.
Figure 7.2 Illustration of link between concatenated X blocks andthe response, Y, through the MB-PLS super-scores, T.
Figure 7.3 Cross-validated explained variance for various choices of number of components for single- and two-response modelling with MB-PLS.
Figure 7.4 Super-weights (w) for the first and second componentfrom MB-PLS on Raman data predicting the PUFA sampleresponse. Block-splitting indicated by vertical dotted lines.
Figure 7.5 Block-weights (wm) for first and second componentfrom MB-PLS on Raman data predicting the PUFAsampleresponse. Block-splitting indicated by vertical dotted lines.
Figure 7.6 Block-scores (tm, for left, middle, and right Raman block,respectively) for first and second component from MB-PLS onRaman data predicting the PUFA sample response. Colours of thesamples indicate the PUFA concentration as % in fat (PUFAfat)and size indicates % in sample (PUFA sample). The two percentagesgiven in each axis label are cross-validated explained variancefor PUFA sample weighted by relative block contributions andcalibrated explained variance for the block (Xm), respectively.
Figure 7.7 Classification by regression. A dummy matrix (here with threeclasses, c for class) is constructed according to which groupthe different objects belong to. Then this dummy matrix isrelated to the input blocks in the standard way described above.
Figure 7.8 AUROC values of different classification tasks. Source: (Deng et al., 2020). Reproduced with permission from ACS Publications.
Figure 7.9 Super-scores (called global scores here) and block-scores for thesparse MB-PLS model of the piglet metabolomics data. Source: (Karaman et al., 2015). Reproduced with permission from Springer.
Figure 7.10 Linking structure of SO-PLS. Scores for both X1 and the orthogonalised version of X2 are combined in a standardLS regression model with Y as the dependent block.
Figure 7.11 The SO-PLS iterates between PLS regression and orthogonalisation, deflating the input block and responses in every cycle. This isillustrated using three input blocks X1, X2, and X3. The upperfigure represents the first PLS regression of Y onto X1. Then the residuals from this step, obtained by orthogonalisation, goes tothe next (figure in the middle) where the same PLS procedure is repeated. The same continues for the last block X3 in the lower partof the figure. In each step, loadings, scores, and weights are available.
Figure 7.12 The CVANOVA is used for comparing cross-validated residuals Ffor different prediction methods/models or for different numbers of blocks in the models (in for instance SO-PLS). The squares or the absolute values of the cross-validated prediction residuals, Dik, are compared using a two-way ANOVA model. The figure below the model represents the data set used. The indices i and k denote the two effects: sample and method. The I samples for each method/model (equal to three in the example) are the same, so astandard two-way ANOVA is used. Note that the error variancein the ANOVA model for the three methods is not necessarilythe same, so this must be considered a pragmatic approach.
Figure 7.13 Måge plot showing cross-validated explained variance for all combinations of components for the four input blocks (up tosix components in total) for the wine data (the digits for each combination correspond to the order A, B, C, D, as describedabove). The different combinations