Multiblock Data Fusion in Statistics and Machine Learning. Tormod Næs
Чтение книги онлайн.
Читать онлайн книгу Multiblock Data Fusion in Statistics and Machine Learning - Tormod Næs страница 10
Figure 2.2 Geometry of PCA. For explanation, see text (withpermission of H.J. Ramaker, TIPb, The Netherlands).
Figure 2.3 Score (a) and loading (b) plots of a PCA on Caber-net Sauvignon wines. Source: Bro and Smilde (2014).Reproduced with permission of Royal Society of Chemistry.
Figure 2.4 PLS validated explained variance when applied to Ramanwith PUFA responses. Left: PLSR on one responseat a time. Right: PLS on both responses (standardised).
Figure 2.5 Score and loading plots for the single response PLS regression model predicting PUFA as percentage of total fat in the sample (PUFAsample).
Figure 2.6 Raw and normalised urine NMR-spectra.Different colours are spectra of different subjects.
Figure 2.7 Numerical representations of the lengths of sticks: (a) left: the empirical relational system (ERS) of which only the length is studied, right: a numerical representation (NRS1), (b) an alternative numerical representation (NRS2) ofthe same ERS carrying essentially the same information.
Figure 2.8 Classical (a) and logistic PCA (b) on the same muta-tion data of different cancers. Source Song et al. (2017). Reproduced with permission from Oxford Academic Press.
Figure 2.9 Classical (a) and logistic PCA (b) on the same methyla-tion data of different cancers. Source Song et al. (2017). Reproduced with permission from Oxford Academic.
Figure 2.10 SCA for two data blocks; one containingbinary data and one with ratio-scaled data.
Figure 2.11 The block scores of the rows of the two blocks. Legend:green squares are block scores of the first block; blue circlesare block scores of the second block and the red stars aretheir averages (indicated with ta). Panel (a) favouring block X1, (b) the MAXBET solution, (c) the MAXNEAR solution.
Figure 2.12 Two column-spaces each of rank two in three-dimensional space.The blue and green surfaces represent the column-spaces and the redline indicated with X12C represents the common component. Source:Smilde et al. (2017). Reproduced with permission of John Wiley and Sons.
Figure 2.13 Common and distinct components. The common componentis the same in both panels. For the distinct componentsthere are now two choices regarding orthogonality: (a) bothdistinct components orthogonal to the common component, (b) distinct components mutually orthogonal. Smilde et al. (2017). Reproduced with permission of John Wiley and Sons.
Figure 2.14 Common components in case of noise: (a) maximally correlated common components within column-spaces; (b) consensus component in neither of the columns-spaces. Smilde et al. (2017). Reproduced with permission of John Wiley and Sons.
Figure 2.15 Visualisation of a response vector, y, projected ontoa two-dimensional data space spanned by x1 and x2.
Figure 2.16 Fitted values versus residuals from a linear regression model.
Figure 2.17 Simple linear regression: ŷ = ax + b (see legend for description of elements). In addition, leverage is indi-cated below the regression plot, where leverage is at a minimum at ¯x and increases for lower and higher x-values.
Figure 2.18 Two-variable multiple linear regression with indicated residuals and leverage (contours below regression plane).
Figure 2.19 Two component PCA score plot of concatenated Raman data.Leverage for two components is indicated by the marker size.
Figure 2.20 Illustration of true versus predicted values from aregression model. The ideal line is indicated in dashed green.
Figure 2.21 Visualisation of bias variance trade-off as a function of model complex-ity. The observed MSE (in blue) is the sum of the bias2 (red dashed),the variance (yellow dashed) and the irreducible error (purple dotted).
Figure 2.22 Learning curves showing how median R2 and Q2 from linear regression develops with the number of training samples for a simulated data set.
Figure 2.23 Visualisation of the process of splitting a data set into a set ofsegments (here chosen to be consecutive) and the sequentialhold-out of one segment (Vk) for validation of models. Alldata blocks Xm and the response Y are split along the sampledirection and corresponding segments removed simultaneously.
Figure 2.24 Cumulative explained variance for PCA of the concatenatedRaman data using naive cross-validation (only leavingout samples). R2 is calibrated and Q2 is cross-validated.
Figure 2.25 Null distribution and observed test statistic usedfor significance estimation with permutation testing.
Figure 3.1 Skeleton of a three-block data set with a shared sample mode.
Figure 3.2 Skeleton of a four-block data set with a shared sample mode.
Figure 3.3 Skeleton of a three-block data set with a shared variable mode.
Figure 3.4 Skeleton of a three-block L-shaped data setwith a shared variable or a shared sample mode.
Figure 3.5 Skeleton of a four-block U-shaped data set with a shared variable or ashared sample mode (a) and a four-block skeleton with a shared variableand a shared sample mode (b). This is a simplified version; it should be understood that all sample modes are shared as well as all variable modes.
Figure 3.6 Topology of a three-block data set with a shared sample mode and unsupervised analysis: (a) full topology and (b) simplified representation.
Figure 3.7 Topology of a three-block data set with ashared variable mode and unsupervised analysis.
Figure 3.8 Different arrangements of data sharing twomodes. Topology (a) and multiway array (b).
Figure 3.9 Unsupervised combination of a three-way and two-way array.
Figure 3.10 Supervised three-set problem sharing the sample mode.
Figure 3.11 Supervised L-shape problem. Block X1 is a predic-tor for block X2 and extra information regardingthe variables in block X1 is available in block X3.
Figure 3.12 Path model structure. Blocks are connected throughshared samples and a causal structure is assumed.
Figure 3.13 Idea of linking two data blocks with ashared sample mode. For explanation, see text.
Figure 3.14 Different linking structures: (a) identity link, (b) flexible link, (c) partial identity link: common (T12C) and distinct (T1D, T2D) components.
Figure 3.15 Idea of linking two data blocks with shared variable mode.
Figure 3.16 Different linking structures for supervised analysis: (a) linking structure where components are used both for the X-blocks and the Y-block; (b) linking structure that only uses components for the X-blocks.
Figure 3.17 Treating common and distinct linking structures for supervised analysis: (a) Linking structure with no differentiation between common and distinct in the X-blocks (C