Data Analytics in Bioinformatics. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Data Analytics in Bioinformatics - Группа авторов страница 25

Data Analytics in Bioinformatics - Группа авторов

Скачать книгу

      SOM’s are highly effective in mapping high dimensional data. The representation of data in the form of map provides quick visualization and interpretation [24].

       2.3.7 Grid-Based Clustering

Schematic illustration of the self organizing map.

      Figure 2.5 Self Organizing Map (SOM).

       2.3.7.1 STING (Statistical Information Grid-Based Algorithm)

      The working principle of this algorithm is based on the users input density value with respect to the cluster areas, space with the low density is referred as non-relevant which is discarded as noisy data. Computationally STING requires fewer resources. The grid structure along with the statistical information associated with the cells gives a graphical representation of cluster formed [26].

       2.3.8 Soft Clustering

      In this approach of soft clustering the data points in the dataset can belong to any cluster this is also defined as a probabilistic model. In simpler terms a single data point can appear in other clusters sharing the similarities. Among the soft clustering approaches FCM is most popular.

       2.3.8.1 FCM (Fuzzy Class Membership)

      This algorithm is mostly applied in microarray data analysis as microarrays are collection of tens of thousands of genes and analysing them concurrently. This uses a membership function upon which a membership matrix is built from the dataset. This is updated at every instance of similarity check with the data points. The degree of membership is given by the weights of the matrix [25] which specifies the data point how similar it is to the mean of a cluster. The membership values ranges from 0 to 1.

       References

      1. Simeone, O., A Very Brief Introduction to Machine Learning With Applications to Communication Systems. IEEE Trans. Cognit. Commun. Networking, 4, 4, 648–664, 2018.

      2. Dixit, P. and Prajapati, G.I., Machine Learning in Bioinformatics: A Novel Approach for DNA Sequencing. 2015 Fifth International Conference on Advanced Computing & Communication Technologies, Haryana, pp. 41–47, 2015.

      3. https://en.wikipedia.org/wiki/Unsupervised_learning.

      4. Jain, A.K., Data clustering: 50 years beyond k-means. Pattern Recognit. Lett., 31, 8, 1, 651−666, 2010, https://doi.org/10.1016/j.patrec.2009.09.011.

      5. Oyelade, J. et al., Data Clustering: Algorithms and Its Applications. 2019 19th International Conference on Computational Science and Its Applications (ICCSA), Saint Petersburg, Russia, pp. 71–81, 2019.

      6. Larrañaga, P., Calvo, B., Santana, R., Bielza, C., Galdiano, J., Inza, I., Lozano, J.A., Armañanzas, R., Santafé, G., Pérez, A., Robles, V., Machine learning in bioinformatics. Briefings Bioinf., 7, 1, 86–112, March 2006, https://doi.org/10.1093/bib/bbk007.

      7. National Research Council (US) Committee on Intellectual Property Rights in Genomic and Protein Research and Innovation, Merrill, S.A. and Mazza, A.M. (Eds.), Reaping the Benefits of Genomic and Proteomic Research: Intellectual Property Rights, Innovation, and Public Health, National Academies Press (US), Washington (DC), 2006, 2, Genomics, Proteomics, and the Changing Research Environment, Available from: https://www.ncbi.nlm.nih.gov/books/NBK19861/.

      8. Boundless.com. License: CC BY-SA: Attribution-ShareAlike.

      9. Oyelade, J., Isewon, I., Oladipupo, F. et al., Clustering Algorithms: Their Application to Gene Expression Data. Bioinform. Biol. Insights, 10, 237–253, 2016.

      10. Kerr, G., Ruskin, H.J., Crane, M., Doolan, P., Techniques for clustering gene expression data. Comput. Biol. Med., 38, 3, 283–293, 2008.

      11. Jain, A.K., Murty, M.N., Flynn, P.J., Data clustering: A review. ACM Comput. Surv., 31, 3, 264–323, 1999.

      12. ©Nature Education, CC-BY-NC-SA.

      14. Chandrasekhar, T., Thangavel, K., Elayaraja, E., Effective clustering algorithms for gene expression data. Int. J. Comput. Appl., 32, 4, 25–9, 2011.

      15. Khan, S.S. and Ahmad, A., Cluster Center Initialization Algorithm for K-Means Clustering. 25, 11, 1293–1302, 2004.

      16. Handhayani, T. and Hiryanto, L., Intelligent Kernel K-Means for Clustering Gene Expression. Procedia Comput. Sci., 59, 171–7, 2015.

      17. Kaufman, L. and Rousseeuw, P.J., Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344, John Wiley & Sons, New York, 1990.

      18. Sokal, R.R. and Michener, C.D., A statistical method for evaluating systematic relationships. Univ. Kansas Sci. Bull., 28, 1409–38, 1958.

      19. Domany, E., Superparamagnetic clustering of data—The definitive solution of an ill-posed problem. Physica A Stat. Mech. Appl., 263, 1, 158–69, 1999.

      20. Guha, S., Rastogi, R., Shim, K., CURE: an efficient clustering algorithm for large databases, in: ACM SIGMOD Record, vol. 27, New York, NY, pp. 73–84, ACM, USA, 1998.

      21. Karypis, G., Han, E.H., Kumar, V., Chameleon: Hierarchical clustering using dynamic modeling. Computer (Long Beach Calif.), 32, 8, 68–75, 1999.

Скачать книгу