Machine Learning For Dummies. John Paul Mueller
Чтение книги онлайн.
Читать онлайн книгу Machine Learning For Dummies - John Paul Mueller страница 19
Symbolists: The origin of this tribe is in logic and philosophy. This group relies on inverse deduction to solve problems.
Connectionists: The origin of this tribe is in neuroscience. This group relies on backpropagation to solve problems.
Evolutionaries: The origin of this tribe is in evolutionary biology. This group relies on genetic programming to solve problems.
Bayesians: The origin of this tribe is in statistics. This group relies on probabilistic inference to solve problems.
Analogizers: The origin of this tribe is in psychology. This group relies on kernel machines to solve problems.
The ultimate goal of machine learning is to combine the technologies and strategies embraced by the five tribes to create a single algorithm (the master algorithm) that can learn anything (see Figure 2-1). Of course, achieving that goal is a long way off. Even so, scientists such as Pedro Domingos (http://homes.cs.washington.edu/~pedrod/
) are currently working toward that goal.
FIGURE 2-1: The five tribes will combine their efforts toward the master algorithm.
This book follows the Bayesian tribe strategy, for the most part, in that you solve most problems using some form of statistical analysis. You do see strategies embraced by other tribes described, but the main reason you begin with statistics is that the technology is already well established and understood. In fact, many elements of statistics qualify more as engineering (in which theories are implemented) than science (in which theories are created). The next section of the chapter delves deeper into the five tribes by viewing the kinds of algorithms each tribe uses. Understanding the role of algorithms in machine learning is essential to defining how machine learning works.
Understanding the Role of Algorithms
Everything in machine learning revolves around algorithms. An algorithm is a procedure or formula used to solve a problem. The problem domain affects the kind of algorithm needed, but the basic premise is always the same — to solve some sort of problem, such as driving a car or playing dominoes. In the first case, the problems are complex and many, but the ultimate problem is one of getting a passenger from one place to another without crashing the car. Likewise, the goal of playing dominoes is to win. The following sections discuss algorithms in more detail.
Defining what algorithms do
An algorithm is a kind of container. It provides a box for storing a method to solve a particular kind of a problem. Algorithms process data through a series of well-defined states. The states need not be deterministic, but the states are defined nonetheless. The goal is to create an output that solves a problem. In some cases, the algorithm receives inputs that help define the output, but the focus is always on the output.
Algorithms must express the transitions between states using a well-defined and formal language that the computer can understand. In processing the data and solving the problem, the algorithm defines, refines, and executes a function. The function is always specific to the kind of problem being addressed by the algorithm.
Considering the five main techniques
As described in the previous section, each of the five tribes has a different technique and strategy for solving problems that result in unique algorithms. Combining these algorithms should lead eventually to the master algorithm that will be able to solve any given problem. The following sections provide an overview of the five main algorithmic techniques.
Symbolic reasoning
The term inverse deduction commonly appears as induction. In symbolic reasoning, deduction expands the realm of human knowledge, while induction raises the level of human knowledge. Induction commonly opens new fields of exploration, while deduction explores those fields. However, the most important consideration is that induction is the science portion of this type of reasoning, while deduction is the engineering. The two strategies work hand in hand to solve problems by first opening a field of potential exploration to solve the problem and then exploring that field to determine whether it does, in fact, solve it.
As an example of this strategy, deduction would say that if a tree is green and green trees are alive, the tree must be alive. When thinking about induction, you would say that the tree is green and the tree is also alive; therefore, green trees are alive. Induction provides the answer to what knowledge is missing given a known input and output.
Connections modelled on the brain’s neurons
The connectionists are perhaps the most famous of the five tribes. This tribe strives to reproduce the brain’s functions using silicon instead of neurons. Essentially, each of the neurons (created as an algorithm that models the real-world counterpart) solves a small piece of the problem, and the use of many neurons in parallel solves the problem as a whole.
The use of backpropagation, or backward propagation of errors, seeks to determine the conditions under which errors are removed from networks built to resemble the human neurons by changing the weights (how much a particular input figures into the result) and biases (which features are selected) of the network. The goal is to continue changing the weights and biases until such time as the actual output matches the target output. At this point, the artificial neuron fires and passes its solution along to the next neuron in line. The solution created by just one neuron is only part of the whole solution. Each neuron passes information to the next neuron in line until the group of neurons creates a final output.
Evolutionary algorithms that test variation
The evolutionaries rely on the principles of evolution to solve problems. In other words, this strategy is based on the survival of the fittest (removing any solutions that don’t match the desired output). A fitness function determines the viability of each function in solving a problem.
Using a tree structure, the solution method looks for the best solution based on function output. The winner of each level of evolution gets to build the next-level functions. The idea is that the next level will get closer to solving the problem but may not solve it completely, which means that another level is needed. This particular tribe relies heavily on recursion and languages that strongly support recursion to solve problems. An interesting output of this strategy has been algorithms that evolve: One generation of algorithms actually builds the next generation.
Bayesian inference
The Bayesians use various statistical methods to solve problems. Given