Informatics and Machine Learning. Stephen Winters-Hilt
Чтение книги онлайн.
Читать онлайн книгу Informatics and Machine Learning - Stephen Winters-Hilt страница 14
Text analytics can also take what is still O(L) processing into mapping the mood or sentiment of text samples by use of word‐scored sentiment tables. The generation and use of such sentiment tables is its own craft, usually proprietary, so only minimal examples are given. Thus Chapter 5 shows an elaboration of FSA‐based analysis that might be done when there is no clear definition of state, such as in language. NLP processing in general encompasses a much more complete grammatical knowledge of the language, but in the end the NLP and the FSA‐based “add‐on” still suffer from not being able to manage word context easily (the states cannot simply be words since the words can have different meaning according to context). The inability to use HMMs has been a blockade to a “universal translator” that has since been overcome with use of Deep Learning using NNs (Chapter 13) – where immense amounts of translation data, such as the massive corpus of dual language Canadian Government proceedings, is sufficient to train a translator (English–French). Most of the remaining Chapters focus on situations where a clear delinaeation of signal state can be given, and thus benefit from the use of HMMs.
1.5 Feature Extraction and Gene Structure Identification
HMMs offer a more sophisticated signal recognition process than FSAs, but with greater computational space and time complexity [125, 126]. Like electrical engineering signal processing, HMMs usually involve preprocessing that assumes linear system properties or assumes observation is frequency band limited and not time limited, and thereby inherit the time‐frequency uncertainty relations, Gabor limit, and Nyquist sampling relations. FSA methods can be used to recover (or extract) signal features missed by HMM or classical electrical engineering signal processing. Even if the signal sought is well understood, and a purely HMM‐based approach is possible, this is often needlessly computationally intensive (slow), especially in areas where there is no signal. To address this there are numerous hybrid FSA/HMM approaches (such as BLAST [127] ) that benefit from the O(L) complexity on length L signal with FSA processing, with more targeted processing at O(LN2) complexity with HMM processing (where there are N states in the HMM model).
Figure 1.2 The Viterbi path. (Left) The Viterbi path is recursively defined, thus tabulatable, with one column only, recursively, dependent on the prior column. (Right) A related recursive algorithm used to perform sequence alignment extensions with gaps (the Smith–Waterman algorithm) is provided by the neighbor‐cell recursively‐defined relation shown.
HMMs, unlike tFSAs, have a straightforward mathematical and computational foundation at the nexus where Bayesian probability and Markov models meet dynamic programming. To properly define or choose the HMM model in a ML context, however, further generalization is usually required. This is because the “bare‐bones” HMM description has critical weaknesses in most applications, which are described in Chapter 7, along with their “fixes.” Fortunately, each of the standard HMM weaknesses can be addressed in computationally efficient ways. The generalized HMMs described in Chapter 7 allow for a generalized Viterbi Algorithm (see Figure 1.2) and a generalized Baum–Welch Algorithm. The generalized algorithms retain path probabilities in terms of a sequence of likelihood ratios, which satisfy Martingale statistics under appropriate circumstances [102] , thereby having Martingale convergence properties (where here convergence is associated with “learning” in this context). Thus, HMM learning proceeds via convergence to a limit state that provably exists in a similar sense to that shown with the Hoeffding inequality [59] , via its proven extension to Martingales [108] . The Hoeffding inequality is a key part of the VC Theorem in ML, whereby convergence for the Perceptron learning process to a solution in an infinite solution space is proven to exist in a finite number of learning steps [109] . Further details on the Fundamental Theorems [102, 103, 108, 109] are summarized in Appendix C.
HMM tools have recently been developed with a number of computationally efficient improvements (described in detail in Chapter 7), where application of the HMM methods will be described for gene‐finding, alt‐splice gene‐finding, and nanopore‐detector signal analysis.
HMM methods are powerful, especially with the enhancements described in Chapter 7, but this would all be for naught in a real‐time, O(L), processing on L size data if the core O(LN2) algorithm (N states in the HMM) could not be distributed, onto O(N2) nodes, say, to get back to an overall distributed process involving HMM feature extraction with O(L) processing (to be part of our real‐time signal processing pipeline). So, a way is needed to distribute the core algorithms for HMM learning: Viterbi and Baum–Welch. It turns out distributed processing, or “chunking,” is possible for the single sweep Viterbi algorithm (ignoring the trivial traceback optimal path recovery that does not cause table alteration). The key to having this chunking capability on the other core learning algorithm, Baum–Welch, is to have a similar single‐pass table production. The standard Baum–Welch requires a forward and a backward sweep across the table during production of the result (with algorithms named accordingly for this purpose in Chapter 3). As this would disallow the chunking solution, what is needed is a single sweep Baum Welch algorithm, which has been discovered and is described in Chapter 3, where it is known as the Linear Memory HMM (at the time of its discovery it was most notable to reviewers due to another oddity, that it required only linear space memory during computation – but memory is cheap, while being able to perform distributed processing with massive speed‐up operationally is much more important). With distributability (asynchronous), computational time is directly reduced by ~N on a cluster with N nodes (see Figure 1.2). The HMM with single‐sweep Linear Memory (distributable) for expectation/maximization (EM) also allows distribution (massive parallel asynchronous processing) for the generalized Viterbi and Baum–Welch algorithms on the Meta‐HMM and Hidden Markov model‐with‐duration (HMMD) variants described in Chapter 3 with distribution shown in (Figure 1.3).