Fundamentals and Methods of Machine and Deep Learning. Pradeep Singh
Чтение книги онлайн.
Читать онлайн книгу Fundamentals and Methods of Machine and Deep Learning - Pradeep Singh страница 18
Figure 2.5 A high-level representation of bucket of models.
One of the best suitable approaches for cross-validation among multiple models in ensemble learning is bake off contest, the pseudo-code of which is given below.
Pseudo-code: Bucket of models
For each of the ensemble model present in the bucket doRepeat constant number of timesDivide the training set into parts, i.e., training set and test set randomlyTrain the ensemble model with training setTest the ensemble model with test setChoose the ensemble model that yields maximum average score value
Some of the advantages offered by bucket of models in diagnosing the zonotic diseases are as follows: high quality prediction, provides unified view of the data, negotiation of local patterns, less sensitive to outliers, stability of the model is high, slower model gets benefited from faster models, parallelized automation of tasks, learning rate is good on large data samples, payload functionality will be hidden from end users, robustness of the model is high, error generation rate is less, able to handle the random fluctuations in the input data samples, length of the bucket is kept medium, easier extraction of features from large data samples, prediction happens by extracting the data from deep web, linear weighted average model is used, tendency of forming suboptimal solutions is blocked, and so on [25, 26].
2.7 Stacking
Stacking is also referred as super learning or stacked regression which trains the meta-learners by combining the results generated by multiple base learners. Stacking is one form of ensemble learning technique which is used to combine the predictions generated by multiple machine learning models. The stacking mechanism is used to solve regression or classification problems. The typical architecture of stacking involves two to three models which are often called as level-0 model and level-1 model. The level-0 model fit on the training data and the predictions generated by it gets compiled. The level-1 model learns how to combine the predictions generated by the predictions obtained from several other models. The simplest approach followed to prepare the training data is k-fold cross- validations of level-0 models. The implementation of stacking is easier and training and maintenance of the data is also easier. The super learner algorithm works in three steps first is to setup the ensemble, train the ensemble which is setup, and after sufficient training test for the new test data samples [27, 28].
The generalization approach in stacking splits the existing data into two parts one is training dataset and another is testing dataset. The base model is divided into K-NN base models, the base model will be fitted into K–1 parts which leads to the prediction of the Kth part. The base model will further fit into the whole training dataset to compute the performance over the testing samples. The process gets repeated on the other base models which include support vector machine, decision tree, and neural network to make predictions over the test models [29]. A high-level representation of stacking is shown in Figure 2.6. Multiple models are considered in parallel, and training data is fed as input to each of the model. Every model generated the predictions and summation of each of the predictions is fed as input to generalizer. Finally, generalizer generates final predictions based on the summation of the predictions generated by each of the model.
Some of the advantages offered by stacking in diagnosing the zonotic diseases are as follows: easily parallelized, easily solves regression problems, simple linear stack approach, lot more efficient, early detection of local patterns, lower execution time, produces high quality output, chances of misclassification is less, increased predictive accuracy, effect of outliers is zero, less memory usage, less computational complexity, capable of handling big data streams, works in an incremental manner, classification of new data samples is easy, used to solve multiple classification problems, approach is better than classical ensemble method, suitable to solve computation intensive applications, generalization of sentiment behind analysis is easy, able to solve nonlinear problems, robust toward large search space, training period is less, capable of handling noisy training data, collaborative filtering helps in removal of noisy elements from training data, suitable to solve multi-classification problems, less number of hyperparameters are involved in training, evolves naturally from new test samples, very less data is required for training, and so on.
Figure 2.6 A high-level representation of stacking.
2.8 Efficiency Analysis
The efficiency achieved by the considered ensemble machine learning techniques, i.e., Bayes optimal classifier, bagging, boosting, BMA, bucket of models, and tacking, is compared toward the performance metrics, i.e., accuracy, throughput, execution time, response time, error rate, and learning rate [30]. From the analysis, it is observed that the efficiency achieved by Bayesian model combination, stacking, and Bayesian model combination are high compared to other ensemble models considered for identification of zonotic diseases.
Technique | Accuracy | Throughput | Execution time | Response time | Error rate | Learning rate |
Bayes optimal classifier | Low | Low | High | Medium | Medium | Low |
Bagging | Low | Medium | Medium | High | Low | Low |
Boosting | Low | Medium | High | High | High | Low |
Bayesian model averaging | High | High | Medium | Medium | Low | Low |
Bayesian model combination | High |
High
|