Informatics and Machine Learning. Stephen Winters-Hilt

Чтение книги онлайн.

Читать онлайн книгу Informatics and Machine Learning - Stephen Winters-Hilt страница 16

Informatics and Machine Learning - Stephen Winters-Hilt

Скачать книгу

rel="nofollow" href="#uc085341e-66f1-58f3-84b4-5abe435d6162">Chapter 10), as well as aiding with signal analysis and pattern recognition on stochastic sequential data. The signal processing material described next, and in detail later, mainly draw from prior journal publications [159–189]. Analysis tools for stochastic sequential data have broad‐ranging application by making any device producing a sequence of measurements more sensitive, or “smarter,” by efficient learning of measured signal/pattern characteristics. The SVM and HMM/SVM application areas described in this book include cheminformatics, biophysics, and bioinformatics. The cheminformatics application examples pertain to channel current analysis on the alpha‐hemolysin nanopore detector (Chapter 14).

      The biophysics and “information flows” associated with the nanopore transduction detector (NTD) in Chapter 14 are analyzed using a generalized set of HMM and SVM‐based tools, as well as ad hoc FSAs‐based methods, and a collection of distributed genetic algorithm methods for tuning and selection. Used with a nanopore detector, the channel current cheminformatics (CCC) for the stationary signal channel blockades (with “stationary statistics”) enables a method for a highly sensitive nanopore detector for single molecule biophysical analysis.

      The SVM implementations described involve SVM algorithmic variants, kernel variants, and chunking variants; as well as SVM classification tuning metaheuristics; and SVM clustering metaheuristics. The SVM tuning metaheuristics typically enable use of the SVM’s confidence parameter to bootstrap from a strong classification engine to a strong clustering engine via use of label changes, and repeated SVM training processes with the new label information obtained.

      SVM Methods and Systems are given in Chapter 10 for classification, clustering, and SSA in general, with a broad range of applications:

       sequential‐structure identification

       pattern recognition

       knowledge discovery

       bioinformatics

       nanopore detector cheminformatics

       computational engineering with information flows

       “SSA” Architectures favoring Deep Learning (see next section)

      SVM binary discrimination outperforms other classification methods with or without dropping weak data (while many other methods cannot even identify weak data).

Schematic illustration of the general stochastic sequential analysis flow topology.

      Source: Based on Winters‐Hilt [1, 3].

      The sequence of algorithmic methods used in the SSA Protocol, for the information‐processing flow topology shown in Figure 1.5, comprise a weak signal handling protocol as follows: (i) the weakness in the (fast) Finite State Automaton (FSA) methods will be shown to be their difficulty in nonlocal structure identification, for which HMM methods (and tuning metaheuristics) are the solution; (ii) for the HMM, in turn, the main weakness is in local sensing “classification” due to conditional independence assumptions. Once in the setting of a classification problem, however, the problem can be solved via incorporation of generalized SVM methods [1, 3]. If facing only classification task (data already preprocessed), the SVM will also be the method of choice in what follows. (iii) The weakness of the SVM, whether used for classification or clustering, but especially for the latter, is the need to optimize over algorithmic, model (kernel), chunking, and other process parameters during learning. This is solved via use of metaheuristics for optimization such as simulated annealing, and genetic algorithm optimization in (iv). The main weaknesses in the metaheuristic tuning effort is partly resolved via use of the “front‐end” methods, like the FSA, and partly resolved by a knowledge discovery process using the SVM clustering methods. The SSA Protocol weak signal acquisition and analysis method thereby establishes a robust signal processing platform.

      The HMM methods are the central methodology or stage in the SSA Protocol, particularly in the gene finders, and sometimes with the CCC protocol or implementation, in that the other stages can be dropped or merged with the HMM stage in many incarnations. For example, in some CCC analysis situations the tFSA methods could be totally eliminated in favor of the more accurate (but time consuming) HMM‐based approaches to the problem, with signal states defined or explored in more or less the same setting, but with the optimized Viterbi path solution taken as the basis for the signal acquisition.

Скачать книгу