Multiblock Data Fusion in Statistics and Machine Learning. Tormod Næs

Чтение книги онлайн.

Читать онлайн книгу Multiblock Data Fusion in Statistics and Machine Learning - Tormod Næs страница 11

Автор:
Жанр:
Серия:
Издательство:
Multiblock Data Fusion in Statistics and Machine Learning - Tormod Næs

Скачать книгу

D1, D2 are distinct for X1 and X2, respectively; e X1 and e X2 represent the unsystematic parts of X1 and X2); (b) first X1 is used and then the remainder of X2 after removing common (predictive) part T1 of X1.

      Figure 4.1 Explanation of the scale (a) and orientation (b) component of the SVD.The axes are two variables and the spread of the samples are visualised including their contours as ellipsoids. Hence, this is a representation ofthe row-spaces of the matrices. For more explanation, see text. Source: Smilde et al. (2015). Reproduced with permission of John Wiley and Sons.

      Figure 4.2 Topology of interactions between genomics data sets. Source: Aben et al. (2018). Reproduced with permission of Oxford University Press.

      Figure 4.3 The RV and partial RV coefficients for the genomics example.For explanation, see the main text. Source: Aben et al. (2018). Reproduced with permission of Oxford University Press.

      Figure 4.4 Decision tree for selecting a matrix correlation method.Abbreviations: HOM is homogeneous data, HET is heterogeneousdata, Gen-RV is generalised RV, Full means full correlations,Partial means partial correlations. For more explanation, see text.

      Figure 5.1 Unsupervised analysis as discussed in this chapter, (a) links between samples and (b) links betweenvariables (simplified representations, see Chapter 3).

      Figure 5.2 Illustration explaining the idea of exploring multiblock data. Source:Smilde et al. (2017). Reproduced with permission of John Wiley and Sons.

      Figure 5.3 The idea of common (C), local (L) and distinct (D) parts of three datablocks. The symbols Xt denote row spaces; Xt13L, e.g., is the part of Xt1 and Xt3 which is in common but does not share a partwith Xt2.

      Figure 5.4 Proportion of explained variances (variances accounted for)for the TIV Block (upper part); the LAIV block (mid-dle part) and the concatenated blocks (lower part). Source:Van Deun et al. (2013). Reproduced with permission of Elsevier.

      Figure 5.6 Difference between weights and correlation loadings explained. Green arrows are variables of Xm; red arrow is the consensus component t; bluearrow is the common component tm. Dotted lines represent projections.

      Figure 5.7 The logistic function η(θ) = (1 + exp(−θ))−1 visualised. Only thepart between [4,4] is shown but the function goes from −∞ to +.

      Figure 5.8 CNA data visualised. Legend: (a) each line is a sample (cell line),blanks are zeros and black dots are ones; (b) the proportionof ones per variable illustrating the unbalancedness. Source:Song et al. (2021). Reproduced with permission of Elsevier.

      Figure 5.9 Score plot of the CNA data. Legend: (a) scores of a logisticPCA on CNA; (b) consensus scores of the first two GSCA components of a GSCA model (MITF is a special gene). Source: Smilde et al. (2020). Licensed under CC BY 4.0.

      Figure 5.10 Plots for selecting numbers of components for the sensory example. (a) SCA: the curve represents cumulative explained variance for the concatenated data blocks. The bars show how much variance each component explains in the individual blocks. (b) DISCO: each point represents the non-congruence value for a given target (model).The plot includes all possible combinations of common and distinct components based on a total rank of three. The horizontal axis represents the number of common components and the numbers inthe plot represent the number of distinct components for SMELLand TASTE, respectively. (c) PCA-GCA: black dots representthe canonical correlation coefficients between the PCA scoresof the two blocks (x100) and the bars show how much variancethe canonical components explain in each block. Source: Smilde et al. (2017). Reproduced with permission of John Wiley and Sons.

      Figure 5.11 Biplots from PCA-GCA, showing the variables as vectors and the samples as points. The samples are labelled according to the design factors flavour type (A/B), sugar level (40,60,80) and flavour dose (2,5,8). The plots show the common component (horizontal) againstthe first distinct component for each of the two blocks. Source: Smilde et al. (2017). Reproduced with permission of John Wiley and Sons.

      Figure 5.12 Amount of explained variation in the SCA model (a) and PCA models (b) of the medical biology metabolomics example. Source: Smilde et al. (2017). Reproduced with permission of John Wiley and Sons.

      Figure 5.13 Amount of explained variation in the DISCO and PCA-GCAmodel. Legend: C-ALO is common across all blocks; C-ALis local between block A and L; D-A, D-O, D-L are distinctin the A, O and L blocks, respectively. Source: Smilde et al. (2017). Reproduced with permission of John Wiley andSons.

      Figure 5.14 Scores (upper part) and loadings (lower part) of the com-mon DISCO component. Source: Smilde et al. (2017).Reproduced with permission of John Wiley and Sons.

      Figure 5.16 True design used in mixture preparation (blue) versus thecolumns of the associated factor matrix corresponding to themixtures mode extracted by the JIVE model (red). Source:Acar et al. (2015). Reproduced with permission of IEEE.

      Figure 5.17 True design used in mixture preparation (blue) versus thecolumns of the associated factor matrix corresponding tothe mixtures mode extracted by the ACMTF model (red).Source: Acar et al. (2015). Reproduced with permission of IEEE.

      Figure 5.18 Example of the properties of group-wise penalties. Left panel: the family of group-wise L-penalties. Right panel: the GDP penalties.The x-axis shows the L2 norm of the original group of elementsto be penalised; the y-axis shows the value of this norm afterapplying the penalty. More explanation, see text. Source: Song et al. (2021). Reproduced with permission of John Wiley and Sons.

      Figure 5.19 Quantification of modes and block-association rules.The matrix V ‘glues together’ the quantifications T and P using the function f = (T, P, V) to approximate X.

      Figure 5.20 Linking the blocks through their quantifications.

      Figure 5.21 Decision tree for selecting an unsupervised method forthe shared variable mode case. For abbreviations, seethe legend of Table 5.1. For more explanation, see text.

      Figure 5.22 Decision tree for selecting an unsupervisedmethod for the shared sample mode case.For abbreviations, see the legend ofTable 5.1. For more explanation, see text.

      Figure 6.1 ASCA decomposition for two metabolites. Thebreak-up of the original data into factor estimatesdue to the factors Time and Treatment

Скачать книгу