Biomedical Data Mining for Information Retrieval. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Biomedical Data Mining for Information Retrieval - Группа авторов страница 15
2 Calculate the auto correlation of the matrix (R).(1.1)
3 Calculate the Eigen vectors (U) and Eigen values (l)(1.2)
4 Rearrange the Eigen vector and Eigen values in descending order
5 Calculate the factor loading matrix (A) by using(1.3)
6 Calculate the score matrix (B)(1.4)
7 Calculate the factor score (F)(1.5)
After reducing the features, FLANN model [32] is used to predict patient’s survival or in-hospital death and finally evaluate the overall performance. The FLANN based mortality prediction model is shown in Figure 1.2. To design FLANN model 4,000 records of patients (dataset A) is selected. Out of 4,000 data, 3,000 data are selected for training and remaining 1,000 data are used for testing the model. During the training process each record with 49 features out of 3,000 records is taken as input. Each of the features is then expanded trigonometrically to five terms and map the data to a nonlinear format. The outputs of the functional expansion is multiplied with the corresponding weight valued and summed together to generate an output which is known as actual output. The actual output is then compared with the desired output either 0.1 (for 0) or 0.9 (for 1). If there are any differences in the actual and desired output, an error signal will be generated. On the basis of this error signal, weights and biases are updated using Least Mean Square (LMS) [33] algorithm. The process is repeated until all training patterns are used. The experiment is continued for 3,000 iterations. The value of learning parameter is 0.1. The mean square error (MSE) value for each iteration is stored and plotted to show the convergence characteristics as given in Figure 1.3. Once the training is over and the model is ready for prediction, 1,000 records which are kept aside for testing purpose in given to the model with fixed value of weights and bias obtained after the end of training process. For each input pattern the output or class label is calculated and compared with the target class label.
Figure 1.2 The FLANN based mortality prediction model.
Similarly, other models Discriminant Analysis (DA), Decision Tree (DT), K-Nearest Neighborhood (KNN), Naive Bayesian and Support Vector Machine (SVM) are also applied to predict mortality in an in-hospital death and obtained results using their own principles as briefed below.
Discriminant analysis [34] is one of the statistical tools which is used to classify individuals into a number of groups. To separate two groups, Discriminant Function Analysis (DFA) is used and to separate more than two groups Canonical Varieties Analysis (CVA) is used. There are two potential goals in a discriminant investigation: finding a prescient condition for grouping new people or deciphering the prescient condition to all the more likely comprehend the connections that may exist among the factors.
Decision Tree [35] is a tree like structure used for classification and regression. It is a supervised machine learning algorithm used in decision making. The objective of utilizing a DT is to make a preparation model that can use to foresee the class or estimation of the objective variable by taking in basic choice principles gathered from earlier data (training information). In DT, for anticipating a class name for a record one has to start from the foundation of the tree. We look at the estimations of the root property with the record’s characteristic. Based on correlation, one follows the branch and jump to the next node.
Figure 1.3 Convergence characteristics of FA-FLANN based mortality prediction model.
KNN [35] is also a supervised machine learning algorithm used for both classification and regression. It is simple and easy to implement algorithm. KNN finds the nearest neighbors by calculating the distance between the data points which is called the Euclidian distance.
A Naive Bayes classifier [35] is a probabilistic AI model that is utilized for classification task. The Bayes equation is given as
(1.6)
Utilizing Bayes hypothesis, it discovers the likelihood of an occurrence, given that B has happened. Here, B represents evidence and A represents hypothesis. The supposition made here is that the indicators/highlights are free. That is nearness of one specific element doesn’t influence the other. Consequently it is called Naïve.
Support Vector Machine [35] is a supervised machine learning algorithms which aims to find a hyperplane in the N-dimensional space. A plane which has the maximum margin is to be chosen. Vectors are information focuses that are nearer to the hyperplane and impact the position and direction of the hyperplane. Utilizing these help vectors, the edge of the classifier is expanded. Erasing the help vectors will change the situation of the hyperplane. These are the focuses that assist in building the SVM.
1.4 Result and Discussion
The results of all the models on testing set containing 1,000 records are shown in the Table 1.3.
As exhibited from the above table DT has outperformed the other five models with an accuracy of 97.95%. FA-FLANN model has secured the 2nd rank with an accuracy of 87.6%. DA, KNN and SVM models are giving almost same results with accuracy of 86.05%, 86.6% and 86.15% respectively. The worst result is reported for the Naïve Bayesian based model with an accuracy of 54.80%.
Table 1.3 Comparison of different models during testing.
S. no. | Model name | Error during testing | Accuracy | Rank | |
---|---|---|---|---|---|
Value | (%) | ||||
1. | FA-FLANN | 0.1240 | 12.40% | 87.60% | 2 |
2. | DA | 0.1395 | 13.95% | 86.05% | 5 |
3. | DT | 0.0205 | 2.05% | 97.95% | 1 |
4. | KNN | 0.1340 |
13.4%
|