Fundamentals and Methods of Machine and Deep Learning. Pradeep Singh
Чтение книги онлайн.
Читать онлайн книгу Fundamentals and Methods of Machine and Deep Learning - Pradeep Singh страница 16
Ensemble machine learning uses multiple machine learning algorithms, to arrive at better performance, compared to individual/stand-alone machine learning algorithms [5, 6]. Some of the potential ensemble learning algorithms like Bayes optimal classifier, bootstrap aggregating (bagging), boosting, Bayesian model averaging (BMA), Bayesian model combination, bucket of models, stacking, and remote sensing. Some of the advantages offered by ensemble machine learning compared to traditional machine learning are as follows: better accuracy is achieved in prediction, scalability if the solution is high as it can handle multiple nodes very well, combines multiple hypothesis to maximize the quality of the output, provides sustainable solution by operating in an incremental manner, efficiently uses the previous knowledge to produce diverse model-based solutions, avoids overfitting problem through sufficient training, models generated are good as they mimics the human like behavior, complex disease spreading traces can be analyzed using combined machine learning models, misclassification of samples is less due to enough training models, not sensitive toward outliers, cross-validation of output data samples increases performance, stability of the chosen hypothesis is high, measurable performance in initial data collection is high, will not converge to local optimal solutions, exhibits non-hierarchical and overlapping behaviors, several open source tools are available for practical implementation of the models, and so on [7–9].
The main goal of applying ensemble machine learning algorithms in identifying the zonotic diseases are as follows: decreases the level of bagging and bias and improves the zonotic disease detection accuracy with minimum iteration of training, automatic identification of diseases, use of base learners make it suitable to medical domain, easy to identify the spread of disease at early stage itself, identifies the feature vector which yields maximum information gain, easy training of hyper parameters, treatment cost is minimum, adequate coverage happens to large set of medical problems, reoccurrence of the medical problems can be identified early, high correlation between machine learning models leads to efficient output, training and execution time is less, scalability of the ensemble models is high, offers aggregated benefits of several models, non-linear decision-making ability is high, provides sustainable solutions to chronic diseases, automatic tuning of internal parameters increases the convergence rate, reusing rate of the clinical trials gets reduced, early intervention prevents spread of disease, capable to record and store high-dimensional clinical dataset, recognition of neurological diseases is easy, misclassification of medical images with poor image quality is reduced, combines the aggregated power of multiple machine learning models, and so on [10, 11].
2.2 Bayes Optimal Classifier
Bayes optimal classifier is a popular machine learning model used for the purpose of prediction. This technique is based on Bayes theorem which is principled by Bayes theorem and closely related to maximum posteriori algorithm. The classifier operates by finding the hypothesis which has maximum probability of occurrence. The probable prediction is carried out by the classifier using probabilistic model which finds the most probable prediction using the training and testing data instances.
The basic conditional probability equation predicts one outcome given another outcome, consider A and B are two probable outcomes the probability of occurrence of event using the equation P(A|B) = (P(B|A)*P(A))/P(B). The probabilistic frameworks used for prediction purpose are broadly classified into two types one is maximum posteriori, and the other is maximum likelihood estimation. The important objective of these two types of probabilistic framework is that they locate most promising hypothesis in the given training data sample. Some of the zonotic diseases which can be identified and treated well using Bayes optimal classifier are Anthrax, Brucellosis, Q fever, scrub typhus, plague, tuberculosis, leptospirosis, rabies, hepatitis, nipah virus, avian influenza, and so on [12, 13]. A high-level representation of Bayes optimal classifier is shown in Figure 2.1. In the hyperplane of available datasets, the Bayes classifier performs the multiple category classification operation to draw soft boundary among the available datasets and make separate classifications. It is observed that, with maximum iteration of training and overtime, the accuracy of the Bayes optimal classifier keeps improving.
Some of the advantages of advantages of Bayes optimal classifier which makes it suitable for tracking and solving the zonotic diseases are as follows: ease of implementation, high accuracy is achieved over less training data, capable of handling both discrete and non-discrete data samples, scalable to any number of data samples, operates at very speed, suitable for real-time predictions, achieves better results compared to traditional classifiers, not much sensitive to outliers, ease generalization, achieves high computational accuracy, works well on linear/nonlinear separable data samples, interpretation of the results is easy, easily mines the complex relationship between input and output data samples, provides global optimal solutions, and so on [14].
Figure 2.1 A high-level representation of Bayes optimal classifier.
2.3 Bootstrap Aggregating (Bagging)
Bootstrap aggregating is popularly referred as bagging is a machine learning–based ensemble technique which improves the accuracy of the algorithm and is used mostly for classification or aggregation purposes. The main purpose of bagging is that it avoids overfitting problem by properly generalizing the existing data samples. Consider any standard input dataset from which new training datasets are generated by sampling the data samples uniformly with replacement. By considering the replacements, some of the observations are repeated in the form of the unique data samples using regression or voting mechanisms. The bagging technique is composed of artificial neural networks and regression tree, which are used to improve the unstable procedures. For any given application, the selection between bagging and boosting depends on the availability of the data. The variance incurred is reduced by combining bootstrap and bagging [15, 16].
Bagging and boosting operations are considered as two most powerful tools in ensemble machine learning. The bagging operation is used concurrently with the decision tree which increases the stability of the model by reducing the variance and also improves the accuracy of the model by minimizing the error rate. The aggregation of set of predictions made by the ensemble models happens to produce best prediction as the output. While doing bootstrapping the prominent sample is taken out using the replacement mechanism in which the selection of new variables is dependent on the previous random selections. The practical application of this technique is dependent on the base learning algorithm which is chosen first and on top of which the bagging of pool of decision trees happen. Some of the zonotic diseases which can be identified and treated well using bootstrap aggregating are zonotic influenza, salmonellosis, West Nile virus, plague, rabies, Lyme disease, brucellosis, and so on [17]. A high-level representation of bootstrap is shown in Figure 2.2. It begins with the training dataset, which is distributed among the multiple bootstrap sampling units. Each of the bootstrap sampling unit operates on the training subset of data upon which the learning algorithm performs the learning operation and generates the classification output. The aggregated sum of each of the classifier is generated as the output.
Figure