Informatics and Machine Learning. Stephen Winters-Hilt

Чтение книги онлайн.

Читать онлайн книгу Informatics and Machine Learning - Stephen Winters-Hilt страница 15

Informatics and Machine Learning - Stephen Winters-Hilt

Скачать книгу

      1.5.1 HMMs for Analysis of Information Encoding Molecules

      The main application areas for HMMs covered in this book are power signal analysis generally, and bioinformatics and cheminformatics specifically (the main reviews and applications discussed are from [128–134]). For bioinformatics, we have information encoding molecules that are polymers, giving rise to sequential data format, thus HMMs are well suited for analysis. To begin to understand bioinformatics, however, we need to know not only the biological encoding rules, largely rediscovered on the basis of their statistical anomalies in Chapters 14, but also the idiosyncratic structures seen (genomes and transcriptomes) that are full of evolutionary artifacts and similarities to evolutionary cousins. To know the nature of the statistical imprinting on the polymeric encodings also requires an understanding of the biochemical constraints that give rise to the statistical biases seen. Once taken altogether, bioinformatics offers a lot of clarity on why Nature has settled on the particular genomic “mess,” albeit with optimizations, that it has selectively arrived at. See [1, 3] for further discussion of bioinformatics.

      1.5.2 HMMs for Cheminformatics and Generic Signal Analysis

      HMM is a common intrinsic statistical sequence modeling method (implementations and applications are mainly drawn from [135–158] in what follows), so the question naturally arises – how to optimally incorporate extrinsic “side‐information” into a HMM? This can be done by treating duration distribution information itself as side‐information and a process is shown for incorporating side‐information into a HMM. It is thereby demonstrated how to bootstrap from a HMM to a HMMD (more generally, a hidden semi‐Markov model or HSMM, as it will be described in Chapter 7).

Schematic illustration of edge feature enhancement via HMM/EM EVA filter.

      Source: Based on Winters‐Hilt [1–3].

      In adopting any model with “more parameters,” such as a HMMBD over a HMM, there is potentially a problem with having sufficient data to support the additional modeling. This is generally not a problem in any HMM model that requires thousands of samples of non‐self transitions for sensor modeling, such as for the gene‐finding that is described in what follows, since knowing the boundary positions allows the regions of self‐transitions (the durations) to be extracted with similar sample number as well, which is typically sufficient for effective modeling of the duration distributions in a HMMD.

      Prior HMM‐based systems for SSA had undesirable limitations and disadvantages. For example, the speed of operation made such systems difficult, if not impossible, to use for real‐time analysis of information. In the SSA Protocol described here, distributed generalized HMM processing together with the use of the SVM‐based Classification and Clustering Methods (described next) permit the general use of the SSA Protocol free of the usual limitations. After the HMM and SSA methods are described, their synergistic union is used to convey a new approach to signal analysis with HMM methods, including a new form of stochastic‐carrier wave (SCW) communication.

      Before moving on to classification and clustering (Chapter 10), a brief description is given of some of the theoretical foundations for learning, starting with the foundation for the choice of information measures used in Chapters 24, and this is shown in Chapter 8. In Chapter 9 we then describe the theory of NNs. The Chapter 9 background is not meant to be a complete exposition on NN learning (the opposite), but merely goes through a few specific analyses in the area of Loss Bounds analysis to give a sense of what makes a good classification method.

Скачать книгу