Machine Learning Techniques and Analytics for Cloud Security. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Machine Learning Techniques and Analytics for Cloud Security - Группа авторов страница 26

Machine Learning Techniques and Analytics for Cloud Security - Группа авторов

Скачать книгу

Dandagi, G. and Byahatti, S., An insight into the swine-influenza A (H1N1) virus infection in humans. Lung India: Official Organ of Indian Chest Society, 28, 1, 34–38, 2011 Jan-Mar.

      14. Ann, Y., McCullers, J.A., Alymova, I., Parson, L.M., Cipollo, J.F., Glycosylation analysis of engineered H3N2 influenza A virus hemagglutinins with sequentially added historically relevant glycosylation sites. J. Proteome Res., 14, 3957–3969, 2015.

      15. Wang, T., Maamary, J., Tan, G., Bournazos, S., Davis, C., Krammer, F., Schlesinger, S., Palese, P., Ahmed, R., Ravetch, J., Anti-HA glycoforms drive B cell affinity selection and determine influenza vaccine efficacy. Cell, 162, 160–169, 2015.

      17. Le, N., Bowden, T., Struwe, W., Crispin, M., Immune recruitment or suppression by glycan engineering of endogenous and therapeutic antibodies. Biochim. Biophys. Acta, 1860, 1655–1668, 2016.

      18. Cedeno-Laurent, F., Opperman, M., Barthel, S., Metabolic inhibition of galectin-1-binding carbohydrates accentuates antitumor immunity. J. Invest. Dermatol., 132, 410–420, 2012.

      19. Maverakis, E., Kim, K., Shimoda, M., Gershwin, E., Patel, F., Wilken, R., Raychaudhuri, S., Ruhaak, L.R., Lebrilla, Glycans In The Immune system and The Altered Glycan Theory of Autoimmunity: A Critical Review. J. Autoimmun., 57, 1–13, 2015 February 1st.

      20. Pereira, M., Alves, I., Vicente, M., Campar, A., Silva, C.M., Padrao, A., Dias, M.A., Pinho, S.S., Glycans as key checkpoints of T cell Activity and Function. Frontiers in immunology, https://doi.org/10.3389/fimmu.2018.02754, 2018.

      21. Baum, G.L. and Cobb, A.B., The direct and indirect effects of glycans on immune function. Glycobiology, 27, 7, 619–624, July 2017.

      22. Reily, C., Stewart, J.T., Novak, J., Glycosylation in health and disease. 15, 6, 346–366, 2019.

      23. Youguo, L. and Haiyan, W., A Clustering Method Based On K-means Algorithm, Elsevier. Phys. Proc., 25, 1104–1109, 2012.

      24. Murtagh, Fionn and Contreras, Pedro, Algorithms for hierarchical clustering: an overview, II, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7, 6, e1219, 2017.

      25. Taubenberger, J.K. and Morens, M.D., The Pathology of Influenza Virus Infections. J. Clin. Micriobiol., 2008.

      1 Email: [email protected]

      3

      Selection of Certain Cancer Mediating Genes Using a Hybrid Model Logistic Regression Supported by Principal Component Analysis (PC-LR)

       Subir Hazra*, Alia Nikhat Khurshid and Akriti

       Meghnad Saha Institute of Technology, Kolkata, India

       Abstract

      In recent times, gene selection whose mutation is associated with some cancers is a promising research area. An important tool to progress in this research work is analyzing microarray gene expression data. Literature survey shows that different algorithms based on Machine Learning have been found effective in cancer classification and gene selection. The selected genes play a significant role as a clinical decision-making support system. It becomes helpful in diagnosing cancer by identifying genes whose expression level changes significantly. As microarray gene expression data is huge in number, so developing gene selection algorithm through Machine Learning approach incurs high computational complexity. Too many features can cause of over fitting and gives poor performance for the algorithm. In the present article, we developed a hybrid approach where we reduced number of features using Principal Component Analysis (PCA) and then applied Logistic Regression model for prediction of genes. After fitting Logistic Regression on test data, it is compared with an accuracy score. By checking the accuracy score, finally, the set of candidate genes is selected whose expression levels are manifested disproportionately. The generated sets of genes are identified for having correlation with certain cancers. The proposed method is demonstrated with two datasets, viz., colon and lung cancer. The result has been finally validated biologically using NCBI database. The efficacy and robustness of the method have also been evaluated.

      Keywords: Gene expression, PCA, Logistic Regression, dimensionality reduction, accuracy score, classification, F-score

      3.1 Introduction

      Cancer classification with the help of analyzing microarray gene expression data is a conventional method nowadays. The biological relevance of genes substantially influences the accuracy of cancer classification. Thus, selection of genes plays a pivotal role and might be observed as main factor for classification of cancer on the basis of microarray data. The process of gene selection relates to the task of selecting a few significant genes that better characterizes the variations [5]. It is always effective to put focus some important genes which are obviously smaller in number and might differ in their expression levels from non-cancerous state to cancerous one. Thus, from the whole genome, only a few number of genes which are dominant should be identified by using effective gene selection method [6]. But extracting information from the vast amount of biological data and understanding the patterns is the most appealing task. This correlation is more pronounced when these genes are located on the same biological path. In this situation, the procedures traditionally used for feature selection often overlook the relationships between genes and select only a few the set of genes which are mostly linked. The irrelevant genes not only contribute to lower output of the classification but also bring additional difficulties in locating genes which are descriptive in nature [7].

      Analyze microarray data and selection of informative genes is always a demanding task. Due to presence of diversity and complexity in different types of cancer, the task is more challenging. With the emergence in the field of biotechnology a bulk amount of data is being generated by utilizing high-density oli-gonucleotide chips and cDNA arrays [8, 9]. Researchers now can measure thousands of gene expression data simultaneously. But there is lack of suitable

Скачать книгу