Informatics and Machine Learning. Stephen Winters-Hilt
Чтение книги онлайн.
Читать онлайн книгу Informatics and Machine Learning - Stephen Winters-Hilt страница 18
1.9.2 Nanoscope Cheminformatics – A Case Study for Device “Smartening”
The Nanoscope example can also be considered as a case study for device “smartening,” whereby device state is tracked in terms of easily measured device characteristics, such as the ambient device “noise.” A familiar example of this would be the sound of your car engine. In essence, you could eventually have an artificial intelligence (AI) listening to the sound of your engine to similarly track state and issue warnings like an expert mechanic with that car, without the need for sensors, or to supplement sensors (reducing expense, providing secondary fail‐safe). Such an AI might even offer predictive fault detection.
1.10 Deep Learning using Neural Nets
ML provides a solution to the “Big Data” problem, whereby a vast amount of data is distilled down to its information essence. The ML solution sought is usually required to perform some task on the raw data, such as classification (of images) or translation of text from one language to another. In doing so, ML solutions are strongly favored where a clear elucidation of the features used in the classification are also revealed. This then allows a more standard engineering design cycle to be accessed, where the stronger features thereby identified may play a stronger role, or guide the refinement of related strong features, to arrive at an improved classifier. This is what is accomplished with the previously mentioned SSA Protocol.
So, given the flexibility of the SSA Protocol to “latch on” to signal that has a reasonable set of features, you might ask what is left? (Note that, all communication protocols, both natural (genomic) and man‐made, have a “reasonable” set of features.) The answer is simply when the number of features is “unreasonable” (with enumeration not even known, typically). So instead of 100 features, or maybe 1000, we now have a situation with 100 000 to 100s of millions of features (such as with sentence translation or complex image classification). Obviously Big Data is necessary to learn with such a huge number of features present, so we are truly in the realm of Big Data to even begin with such problems, but now have the Big Features issue (e.g. Big Data with Big Features, or BDwBF). What must occur in such problems is a means to wrangle the almost intractable large feature set of information to a much smaller feature set of information, e.g. an intial layer of processing is needed just to compress the feature data. In essence, we need a form of compressive feature extraction at the outset in order to not overwhelm the acquisition process. An example from the biology of the human eye, is the layer of local neural processing at the retina before the nerve impulses even travel on to the brain for further layers of neural processing.
For translation we have a BDwBF problem. The feature set is so complex the best approach is NN Deep Learning where we assume no knowledge of the features but rediscover/capture those features in compressed feature groups that are identified in NN learning process at the first layer of the NN architecture. This begins a process of tuning over NN architectures to arrive at a compressive feature acquisitiuon with strong classification performance (or translation accuracy, in this example). This learning approach began seeing widespread application in 2006 and is now the core method for handling the Big Feature Set (BFS) problem. The BFS problem may or may not exist at the initial acquisition (“front‐end”) of your signal processing chain. NN Deep Learning to solve the BFS problem will be described in detail in Chapter 13, where examples using a Python/TensorFlow application to translation will be given. In the NN Deep Learning approach, the features are not implicitly resolvable, so improvements are initially brute force (even bigger data) since an engineering cycle refinement would involve the enormous parallel task of explicitly resolving the feature data to know what to refine.
1.11 Mathematical Specifics and Computational Implementations
Throughout the text an effort is made to provide mathematical specifics to clearly understand the theoretical underpinnings of the methods. This provides a strong exposition of the theory but the motivation for this is not to do more theory, but to then proceed to a clearly defined computational implementation. This is where mathematical elegance meets implementation/computational practicality (and the latter wins). In this text, the focus is almost entirely on elegent methods that also have highly efficient computational implementations.
2 Probabilistic Reasoning and Bioinformatics
In this chapter, a review is given of statistics and probability concepts, with implementation of many of the concepts in Python. Python scripts are then used to do a preliminary examination of the randomness of genomic (virus) sequence data. A short review of Linux OS setup (with Python automatically installed) and Python syntax is given in Appendix A.
Numerous prior book, journal, and patent publications by the author [1–68] are drawn upon extensively throughout the text. Almost all of the journal publications are open access. These publications can typically be found online at either the author's personal website (www.meta‐logos.com) or with one of the following online publishers: www.m‐hikari.com or bmcbioinformatics.biomedcentral.com.
2.1 Python Shell Scripting
A “fair” die has equal probability of rolling a 1, 2, 3, 4, 5, or 6, i.e. a probability of 1/6 for each of the outcomes. Notice how the sum of all of the discrete probabilities for the different outcomes all add up to 1, this is always the case for probabilities describing a complete set of outcomes.
A “loaded” die has a non‐uniform distribution, for prob = 0.5 to roll a “6” and uniform on the other die rolls you have loaded die_roll_probability = (1/10,1/10,1/10,1/10,1/10,1/2).
The first program to be discussed is named prog1.py and will introduce the notion of discrete probability distributions in the context of rolling the familiar six‐sided die. Comments in Python are the portion of a line to the right of any “#” symbol (except for the first line of code with “#!.....”, that is explained later).
The Shannon entropy of a discrete probability distribution is the measure of its amount of randomness, with the uniform probability distribution having the greatest randomness (e.g. it is most lacking in any statistical “structure” or “information”). Shannon entropy is the sum of each outcome probability times its log probability, with an overall negative placed in front to arrive at a definition involving a positive value. Further details on the mathematical formalism will be given in Chapter 3, but for now we can implement this in our first Python program:
-------------------------- prog1.py ------------------------- #!/usr/bin/python import numpy as np import math import re arr = np.array([1.0/10,1.0/10,1.0/10,1.0/10,1.0/10,1.0/2]) # print(arr[0]) shannon_entropy