Multiblock Data Fusion in Statistics and Machine Learning. Tormod Næs
Чтение книги онлайн.
Читать онлайн книгу Multiblock Data Fusion in Statistics and Machine Learning - Tormod Næs страница 14
Figure 8.1 Figure (a)–(c) represent an L-structure/skeleton and Figure (d) a domino structure. See also notation and discussion of skeletonsin Chapter 3. The grey background in (b) and (c) indicates thatsome methods analyse the two modes sequentially. Different topologies, i.e., different ways of linking the blocks, associatedwith this skeleton will be discussed for each particular method.
Figure 8.2 Conceptual illustration of common information shared by the three blocks. The green colour represents the common column space of X1 and X2 and the red the common row space of X1 and X3. The orangein the upper corner of X1 represents the joint commonness of the two spaces. The blue is the distinct parts of the blocks. This illustration is conceptual, there is no mathematical definition available yet about the commonness between row spaces and column spaces simultaneously.
Figure 8.3 Topologies for four different methods. The three first ((a), (b), (c)) are based on analysing the two modes in sequence. (a) PLS used for both modes (this section). (b) Correlation first approach (Section 8.5.4). (c) Using unlabelled data in calibration (Section 8.5.2). The topology in (d) will be discussed in Section 8.3. We refer to the main textfor more detailed descriptions. The dimensions of blocks are X1 (I × N), X2 (I × J), and X3 (K × N). The topology in (a) corresponds to external preference mapping which will be given main attention here.
Figure 8.4 Scheme for information flow in preferencemapping with segmentation of consumers.
Figure 8.5 Preference mapping of dry fermented lamb sausages: (a) sensoryPCA scores and loadings (from X2), and (b) consumer loadings presented for four segments determined by cluster analysis. Source: (Helgesen et al., 1997). Reproduced with permission from Elsevier.
Figure 8.6 Results from consumer liking of cheese. Estimatedeffects of the design factors in Table 8.3. Source: Almli et al. (2011). Reproduced with permission from Elsevier.
Figure 8.7 Results from consumer liking of cheese. (a) loadings from PCAof the residuals from ANOVA (using consumers as rows). Letters R/P in the loading plot refer to raw/pasteurised milk, and E/Srefer to everyday/special occasions. (b) PCA scores from the same analysis with indication of the two consumer segments. Source:Almli et al. (2011). Reproduced with permission from Elsevier.
Figure 8.8 Relations between segments and consumer characteristics. Source: (Almli et al., 2011). Reproduced with permission from Elsevier.
Figure 8.9 Topology for the extension. This is a combinationof a regression situation along the horizontal axisand a path model situation along the vertical axis.
Figure 8.10 L-block scheme with weights w’s. The w’sare used for calculating scores for deflation.
Figure 8.11 Endo-L-PLS results for fruit liking study. Source: (Martens et al., 2005). Reproduced with permission from Elsevier.
Figure 8.12 Classification CV-error as a function of the α valueand the number of L-PLS components. Source: (Sæbø et al., 2008b). Reproduced with permission from Elsevier.
Figure 8.13 (a) Data structure for labelled and unlabelled data. (b) Flow chart for how to utilise unlabelled data
Figure 8.14 Tree for selecting methods with complex data structures.
Figure 9.1 General setup for fusing heterogeneous data using representation matrices. The variables in the blocks X1, X2 and X3 are represented with proper I × I representation matrices whichare subsequently analysed simultaneously with an IDIOMIX model generating scores and loadings. Source: Smilde et al. (2020). Reproduced with permission of John Wiley and Sons.
Figure 9.2 Score plots of IDIOMIX, OS-SCA and GSCA for the genomicsfusion; always score 3 (SC3) on the y-axes and score 1 (SC1)on the x-axes. The third component clearly differs among themethods. Source: Smilde et al. (2020). Licensed under CC BY 4.0.
Figure 9.3 True design used in mixture preparation (blue) versus the columnsof associated factor matrix corresponding to the mixture mode extracted by the BIBFA model (red) and the ACMTF model (red). Source: Acar et al. (2015). Reproduced with permission from IEEE.
Figure 9.4 Cross-validation results for the penalty parameter λbin of themutation block (left) and for the drug response, transcriptome,and methylation blocks (λquan, right) in the PESCA model.More explanation, see text. Adapted from Song et al. (2019).
Figure 9.5 Explained variances of the PESCA (a) and MOFA (b) model on the CCL data. From top to bottom: drug response, methylation, transcriptome,and mutation data. The values are percentages of explained variation. More explanation, see text. Adapted from Song et al. (2019).
Figure 9.6 From multiblock data to three-way data.
Figure 9.7 Decision tree for selecting an unsupervised method. For abbreviations,see the legend of Table 9.1. The furthest left leaf is empty but alsoCD methods can be used in that case. For more explanation, see text.
Figure 10.1 Results from multiblock redundancy analysis of theWine data, showing Y scores (ur) and block-wiseweights for each of the four input blocks (A, B, C, D).
Figure 10.2 Pie chart of the sources of contribution to thetotal variance (arbitrary sector sizes for illustration).
Figure 10.3 Flow chart for the NI-SL method.
Figure 10.4 An illustration of SO-N-PLS, modelling a responseusing a two-way matrix, X1, and a three-way array, X2
Figure 10.5 Path diagram for a wine tasting study. The blocks repre-sent the different stages of a wine tasting experiment andthe arrows indicate how the blocks are linked. Source: (Næs et al., 2020). Reproduced with permission from Wiley.
Figure 10.6 Wine data. PCP plots for prediction of block D from blocks A, B, andC. Scores and loadings from PCA on the predicted y-values on top.The loadings from projecting the orthogonalised X-blocks (exceptthe first which is used as is) onto the scores at the bottom. Source:Romano et al. (2019). Reproduced with permission from Wiley & Sons.
Figure 10.7 An illustration of the multigroup setup, wherevariables are shared among X blocks and relatedto responses, Y, also sharing their own variables.
Figure 10.8 Decision tree for selecting a supervisedmethod. For more explanation, see text.
Figure 11.1 Output from use of scoreplot() on a pca object.
Figure 11.2 Output from use of loadingplot() on a cca object.
Figure 11.3 Output from use of scoreplot(pot.sca,labels = ”names”) (SCA scores in 2 dimensions).
Figure 11.4 Output from use of loadingplot(pot.sca,block