Multiblock Data Fusion in Statistics and Machine Learning. Tormod Næs

Чтение книги онлайн.

Читать онлайн книгу Multiblock Data Fusion in Statistics and Machine Learning - Tormod Næs страница 17

Автор:
Жанр:
Серия:
Издательство:
Multiblock Data Fusion in Statistics and Machine Learning - Tormod Næs

Скачать книгу

It is crucial that the roles of the blocks are exchangeable: we can change the order of the blocks without changing the solution.

      Supervised analysis refers to predictive data analysis, where emphasis is on a single block of data, Y, (dependent block/response) which is connected to one or more blocks of data, Xm, (independent block(s)/predictors) through regression or classification. The role of the blocks is now important: some blocks are regarded as dependent and some are regarded as independent.

      Figure 1.7 L-shape data of consumer liking studies.

      1.3.2 High-, Mid- and Low-level Fusion

      Figure 1.1 High-level, mid-level, and low-level fusion for two input blocks. The Z’s represent the combined information from the two blocks which is used for making the predictions. The upper figure represents high-level fusion, where the results from two separate analyses are combined. The figure in the middle is an illustration of mid-level fusion, where components from the two data blocks are combined before further analysis. The lower figure illustrates low-level fusion where the data blocks are simply combined into one data block before further analysis takes place.

      ELABORATION 1.2

      High Level Supervised Fusion

      A possible drawback with this strategy as compared to low-level and feature-level fusion is that it does not provide further insight into how the different measurements relate to each other and how they can be combined in a good way in the prediction of the outcome. On the other hand, high-level fusion of prediction results for new samples does not generally require the individual predictors to be developed from the same samples. In other words, when two (or more) predictors are to be combined for a new sample, they do not need to come from the same data source. It is possible to simply plug in the new data and obtain predictions that can be combined as described below. In this sense it is more flexible (Ballabio et al. (2019)) than low- and feature-level fusion. It has been shown in Doeswijk et al. (2011) that fusing classifiers most often gives similar or improved prediction results as compared to using only one of them. An overview of the use of high-level fusion (and other methods) can be found in Borràs et al. (2015).

      A simple way of combining classifiers is to use voting based on counting the number of times the classifiers agree. There are different types of voting schemes that are proposed in the literature. One of them is simple democratic majority voting which means that the group/class that gets the highest number of votes is chosen. In the case of ties, the result is inconclusive. An alternative strategy is 75% voting which means that 75% of the votes should be for the same class before a decision can be made.

      Fusing quantitative predictors is most easily done using averages or weighted averages with weights depending on the prediction error of the different predictions, as determined by, for instance, cross-validation. This strategy has similarities with so-called bagging (see, e.g., Freund (1995)). In machine learning, high-level supervised fusion is found in the sub-domain ‘ensemble learning’.

      1.3.3 Dimension Reduction

      Figure 1.2 Idea of dimension reduction and components. The scores T summarise the relationships between samples; the loadings P summarise the relationships between variables. Sometimes weights W are used to define the scores.

      In this figure, the matrix X(I×J) consists of J variables measured on I samples. The matrix W(J×R) of weights defines the scores XW=T(I×R) where R is much smaller than J. This is the dimension reduction (or data compression) part and the idea is that T represents the samples in matrix X in a good way depending on the purpose. Likewise, the variables are represented in the loadings P(J×R) which can be connected to the scores in a least squares sense, e.g., in the model X=TPt+E. There are many alternatives to compute the weights, scores, and loadings depending on the specific situation; this will be explained in subsequent chapters.

      The idea of dimension reduction by using components or latent variables is very old and has proven to be a very powerful paradigm, with many applications in the natural- life- and social sciences. When considering multiple blocks of data, each block is summarised by its components and the relationships between the blocks is then modelled by building relationships between those components. There are also many ways to build such relationships and we will discuss those in this book.

      There are many reasons for and advantages of using dimension reduction methods:

       The number of sources of variability in data blocks is usually (much) smaller than the number of measured variables.

       Component-based methods are suitable for interpretation through the scores and loadings associated with the extracted components.

       Underlying components and latent variables are appropriate for mental abstractions and interpretation.

Скачать книгу