Informatics and Machine Learning. Stephen Winters-Hilt

Чтение книги онлайн.

Читать онлайн книгу Informatics and Machine Learning - Stephen Winters-Hilt страница 17

Informatics and Machine Learning - Stephen Winters-Hilt

Скачать книгу

and Chapter 11). The HMM‐based feature extraction provides a well‐focused set of “eyes” on the data, no matter what its nature, according to the underpinnings of its Bayesian statistical representation. The key is that the HMM not be too limiting in its state definition, while there is the typical engineering trade‐off on the choice of number of states, N, which impacts the order of computation via a quadratic factor of N in the various dynamic programming calculations (comprising the Viterbi and Baum–Welch algorithms among others).

      The HMM “sensor” capabilities can be significantly improved via switching from profile‐Markov Model (pMM) sensors to pMM/SVM‐based sensors, as indicated in [1, 3] and Chapter 7, where the improved performance and generalization capability of this approach is demonstrated.

      In standard band‐limited (and not time‐limited) signal analysis with periodic waveforms, sampling is done at the Nyquist rate to have a fully reproducible signal. If the sample information is needed elsewhere, it is then compressed (possibly lossy) and transmitted (a “smart encoder”). The received data is then decompressed and reconstructed (by simply summing wave components, e.g. a “simple” decoder). If the signal is sparse or compressible, then compressive sensing [190] can be used, where sampling and compression are combined into one efficient step to obtain compressive measurements (the simple encoding in [190] since a set of random projections are employed), which are then transmitted (general details on noise in this context are described in [191, 192]). On the receiving end, the decompression and reconstruction steps are, likewise, combined using an asymmetric “smart” decoding step. This progression toward asymmetric compressive signal processing can be taken a step further if we consider signal sequences to be equivalent if they have the same stationary statistics. What is obtained is a method similar to compressive sensing, but involving stationary‐statistics generative‐projection sensing, where the signal processing is non‐lossy at the level of stationary statistics equivalence. In the SCW signal analysis the signal source is generative in that it is describable via use of a HMM, and the HMM’s Viterbi‐derived generative projections are used to describe the sparse components contributing to the signal source. In SCW encoding the modulation of stationary statistics can be man‐made or natural, with the latter in many experimental situations involving a flow phenomenology that has stationary statistics. If the signal is man‐made, usually the underlying stochastic process is still a natural source, where it is the changes in the stationary statistics that is under the control of the man‐made encoding scheme. Transmission and reception are then followed by generative projection via Viterbi‐HMM template matching or via Viterbi‐HMM feature extraction followed by separate classification (using SVM). So in the SCW approach the encoding is even simpler (possibly non‐existent, other than directly passing quantized signal) and is applicable to any noise source with stationary statistics (e.g. a stationary signal with reproducible statistics, the case for many experimental observations). The decoding must be even “smarter,” on the other hand, in that generalized Viterbi algorithms are used, and possibly other ML methods as well, SVMs in particular. An example of the stationary statistics sensing with a ML‐based decoder is described in application to CCC studies in Chapter 14.

      1.9.1 Stochastic Carrier Wave (SCW) Analysis – Nanoscope Signal Analysis

      The Nanoscope described in Chapter 14 builds from nanopore detection with introduction of reporter molecules to arrive at a nanopore transduction detection paradigm. By engineering reporter molecules that produce stationary statistics (a SCW) together with ML signal analysis methods designed for rapid analysis of such signals, we arrive at a functioning “nanoscope.”

      Nanopore detection is made possible by the following well‐established capabilities: (i) classic electrochemistry; (ii) pore‐forming protein toxin in a bilayer; and (iii) patch clamp amplifier. Nanopore transduction detection leverages the above detection platform with (iv) an event‐transducer pore‐blockader that has stationary statistics and (v) ML tools for real‐time SCW signal analysis. The meaning of “real‐time” is dependent on the application. In the Nanoscope implementation discussed in Chapter 14, each signal is usually identified in less than 100 ms, where calling accuracy is 99.9% if rejection is employed, and improved even further if signal sample duration, when a call is forced, is used with duration greater than 100 ms.

      Nanopore transduction detection offers prospects for highly sensitive and discriminative biosensing. The NTD “Nanoscope” functionalizes a single nanopore with a channel current modulator that is designed to transduce events, such as binding to a specific target. Nanopore event transduction involves single‐molecule biophysics, engineered information flows, and nanopore cheminformatics. In the NTD functionalization the transducer molecule is drawn into the channel by an applied potential but is too big to translocate, instead becoming stuck in a bistable capture such that it modulates the channel’s ion‐flow with stationary statistics in a distinctive way. If the channel modulator is bifunctional in that one end is meant to be captured and modulated while the other end is linked to an aptamer or antibody for specific binding, then we have the basis for a remarkably sensitive and specific biosensing capability.

      In the NTD Nanoscope experiments [2] , the molecular dynamics of a (single) captured non‐translocating transducer molecule provide a unique stochastic reference signal with stable statistics on the observed, single‐molecule blockaded channel current, somewhat analogous to a carrier signal in standard electrical engineering signal analysis. Discernible changes in blockade statistics, coupled with SSA signal processing protocols, enable the means for a highly detailed characterization of the interactions of the transducer molecule with binding targets (cognates) in the surrounding (extra‐channel) environment.

      Thus, in Nanoscope applications of the SSA Protocol, due to the molecular dynamics of the captured transducer molecule, a unique reference signal with strongly stationary (or weakly, or approximately stationary) signal statistics is engineered to be generated during transducer blockade, analogous to a carrier signal in standard electrical engineering signal analysis. In these applications a signal is deemed “strongly” stationary if the EM/EVA projection (HMM method from Chapter 6) on the entire dataset of interest produces a discrete set of separable (non‐fuzzy domain) states. A signal is deemed “weakly” stationary if the EM/EVA projection can only produce a discrete set of states on subsegments (windowed sections) of the data sequence, but where state‐tracking is possible across windows (i.e. the non‐stationarity is sufficiently slow to track states – similar to the adiabatic criterion in statistical mechanics). A signal is approximately stationary, in a general sense, if it is sufficiently stationary to still benefit, to some extent, from the HMM‐based signal processing tools (that assume stationarity).

      The adaptive SSA ML algorithms, for real‐time analysis of the stochastic signal generated by the transducer molecule can easily offer a “lock and key” level of signal discrimination. The heart of the signal processing algorithm is a generalized Hidden Markov Model (gHMM)‐based feature extraction method,

Скачать книгу