Informatics and Machine Learning. Stephen Winters-Hilt

Чтение книги онлайн.

Читать онлайн книгу Informatics and Machine Learning - Stephen Winters-Hilt страница 13

Informatics and Machine Learning - Stephen Winters-Hilt

Скачать книгу

signals that can be data mined for their attributes. With a filter stage thereby trained, later scan passes can pass suspected signals with very weak specificity (very high sensitivity now) with high specificity then recovered by use of the filter. This then allows a bootstrap process to a very high specificity (SP) and sensitivity (SN) at the tFSA acquisition stage on the signals of interest.

      Ad hoc signal acquisition refers to finding the solution for “this” situation (whatever “this” is) without consideration of wider application. The solution is strongly data dependent in other words. Data dependent methodologies are, by definition, not defined at the outset, but must be invented as the data begins to be understood. As with data dependency in non‐evolutionary search metaheuristics, where there is no optimal search method that is guaranteed to always work well, here there is no optimal signal acquisition method known in advance. This is simply restating a fundamental limit from non‐evolutionary search metaheuristics in another form [1, 3]. What can be done, however, is assemble the core tools and techniques from which a solution can be constructed and to perform a bootstrap algorithmic learning process with those tools (examples in what follows) to arrive at a functional signal acquisition on the data being analyzed. A universal, automated, bootstrap learning process may eventually be possible using evolutionary learning algorithms. This is related to the co‐evolutionary Free Lunch Theorem [1, 3], and this is discussed in Chapter 12.

      “Bootstrap” refers to a method of problem solving when the problem is solved by seemingly paradoxical measures (the name references Baron von Munchausen who freed the horse he was riding from a bog by pulling himself, and the horse with him, up by his bootstraps). Such algorithmic methods often involve repeated passes over the data sequence, with improved priors, or a trained filter, among other things, to have improved performance. The bootstrap amplifier from electrical engineering is an amplifier circuit where part of the output is used as input, particularly at start‐up (known as bootstrapping), allowing proper self‐initialization to a functional state (by amplifying ambient circuit noise in some cases). The bootstrap FSA proposed here is a meta‐algorithmic method in that performance “feedback” with learning is used in algorithmic refinements with iterated meta‐algorithmic learning to arrive at a functional signal acquisition status.

      Thus, FSA processes allow signal regions to be identified, or “acquired,” in O(L) time. Furthermore, in that same order of time complexity, an entire panoply of statistical moments can also be computed on the signals (and used in a bootstrap learning process). The O(L) feature extraction of statistical moments on the signal region acquired may suffice for localized events and structures. For sequential information or events, however, there is often a non‐local, or extended structural, aspect to the signal sought. In these situations we need a general, powerful, way to analyze sequential signal data that is stochastic (random, but with statistics, such as average, that may be unchanging over time if “stationary,” for example). The general method for performing stochastic sequential analysis (SSA) is via HMMs, as will be extensively described in Chapters 6 and 7, and briefly summarized in Section 1.5 that follows. HMM approaches require an identification of “states” in the signal analysis. If an identification of states is difficult, such as in situations where there can be changes in meaning according to context, e.g. language, then HMMs may not be useful. Text and language analytics are described in Chapters 5 and 13, and briefly outlined in the next section.

      The FSA sequential‐data signal processing, and extraction of statistical moments on windowed data, will be shown in Chapter 2 to be O(L) with L the size of the data (double the data and you double the processing time). If HMMs can be used, with their introduction of states (the sequential data is described as a sequentence of “hidden” states), then the computational cost goes as O(LN2). If N = 10, then this could be 100 times more computational time to process than that of a FSA‐based O(L) computation, so the HMMs can generally be a lot more expensive in terms of computational time. Even so, if you can benefit from a HMM it is generally possible to do so, even if hardware specialization (CPU farm utilization, etc.) is required. The problem is if you do not have a strong basis for a HMM application, e.g. when there is no strong basis for delineating the states of the system of communication under study. This is the problem encounterd in the study of natural languages (where there is significant context dependency). In Chapter 5 we look into FSA analysis for

Скачать книгу