Data Analytics in Bioinformatics. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Data Analytics in Bioinformatics - Группа авторов страница 31

Data Analytics in Bioinformatics - Группа авторов

Скачать книгу

Authors depicted that the proposed model was able to show high accuracy than exiting models because of the use of more number of samples for training data, more than one hidden layer and large number of neurons in each hidden layer.

      Bordoloi et al. [37] compared amino acid sequence of a particular protein structure to predict secondary structure of protein and for this work authors used a fully connected MLP with backpropagation algorithm with only one hidden layer. Here amino acid structures were the input to the network and predicted structures representing four different classes such as hemoglobin, myoglobin, sickle cell anemia and insulin were the output. The network was trained using back propagation algorithm to update weights and for evaluating the error. To minimize the error rate produced by the model, Levenberg–Marquardt optimization technique was implemented. The ANN model showed 100% accuracy in successfully identifying the required parameters. It was seen that when the number of epochs to train the model was increased, model produced high accuracy rate however, training time of the network also increased slightly. It was also observed that, since the dataset was large ANN model produced some computational constraints while training the model.

      Shanthi et al. [38] used a feature selection method for stroke disease classification using multilayer perceptron. Research was carried out using the dataset having 20 attributes of 150 patients with symptoms of stroke disease. To reduce the number of features from 20 to 14 genetic algorithm was used in a neuro-genetic approach. This new feature set of symptoms were given to the model for prediction of type of stroke disease. The model was trained using backpropagation algorithm with 7 hidden neurons and sigmoid activation function. The result showed that this neuro-genetic approach obtained a better accuracy than traditional ANN with reduced number of features.

       3.4.1 Comparative Analysis of ANN With Broadly Used Traditional ML Algorithms

      Table 3.1 Shows published articles of ANN used for biological data.

      In this study we observed that ANN classifier outperformed all other classifiers with a reasonable accuracy result. Here we discuss our critical observation on various factors that improve the performance of ANN model in achieving high accuracy. ANN learns to solve complex problems because of their tremendous parallel processing, adaptive learning, fault tolerance and self-organization capability which ensure high classification performance. ANN has been the most powerful tool in classification and prediction. The performance of ANN algorithm depends on various factors such as pre-processing of data, (dimension of dataset, availability of incomplete data or noisy data, selection of features) activation function to be used, selection of number of epochs and neurons. Selecting huge amount of features or wrong features could affect the performance of model. The performance of ANN also depends on selection of right combination of input variables and other parameters.

      In case of SVM model it is seen that when number of features exceeds number of samples the model tends to perform slow so more work on feature selection is required. But, when substantial amount of DNA sequencing data is present for two-class disease classification we can say SVM is a great classification model. We also observed that when inputs are noisy or incomplete, neural networks are still able to produce reasonable result. So the correct use of Data pre-processing technique could improve the performance of the classification model.

      Another factor that influences the performance of the model is the right choice of activation function for classifying linear and non-linear data. One of the fastest learning activation functions is ReLU function that gives more accurate result because it is easy to optimize with gradient descent and result in global optimal solution.

      In this paper we have discussed the different applications of ANN related to different fields of bioinformatics. We have also made a comparative study between various machine learning algorithms and ANN algorithm to get some useful variants about how ANN works and what affects the performance of an ANN classification model and how the performance of the model can be improved to get more accurate result. The problems associated with the traditional approach to solve classification problem can be overcome with the concept of deep learning, which will allow faster learning by reducing the computational cost of the classification model even for the large dataset with inbuilt feature engineering that reduces the requirement of domain expertise. The observation from the study shows that the ANN and its variations can be used to solve complex problem of disease diagnostics or prognosis related with bioinformatics, resulting in the improved lifestyle and environment.

       References

      1. https://en.wikipedia.org/wiki/Bioinformatics.

      2. https://microbenotes.com/bioinformatics-introduction-and-applications/.

      3. https://en.wikipedia.org/wiki/Structural_biology.

      4. Mehmood, M.A., Sehar, U., Ahmad, N., Use of Bioinformatics Tools in Different Spheres of Life Sciences. J. Data Mining Genomics Proteomics, 5, 1–13, 2014.

      5. Singh, H., Bioinformatics: Benefits to Mankind. Int. J. Pharm. Tech. Res., 9, 4, 242−248, 2016.

      6. https://www.biotecharticles.com/Bioinformatics-Article/Applications-of- Bioinformatics-3270.html.

      7. Rhee, S.Y., Dickerson, J., Xu, D., Bioinformatics and its applications in plant biology.

Скачать книгу