Machine Learning Techniques and Analytics for Cloud Security. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Machine Learning Techniques and Analytics for Cloud Security - Группа авторов страница 22
In 2015, a framework has been proposed of the genetics of the new strain and recognized its nearest relatives in swine using a cluster analysis approach like as the PCA and k-means clustering algorithm and suitable with a reassortment of Eurasian and North American swine viruses [5, 20]. Glycoproteins are the key elements of human pathogenic viruses and perform important roles in infection and immunity. The influenza A virus contains two surface glycoproteins which consist of hemagglutinin (HA) and neuraminidase (NA) that dominate the virion exterior and form antibodies. One major of the components of the outermost layer of viruses is glycans. The communication between the viral pathogens with pathogens’ hosts is affected by the glycans’ pattern and glycan-binding receptors. Due to the mass branching of carbohydrates, they are the complex bio-molecules, and in this process, various glycoproteins are used to recognize with human pathogens (virus). Infectious glycans can be either virus-encoded or can be host-derived that usually obtained by humoral immune responses (high) within the human body. HA and NA both are responsible for creating a connection with envelope glycoproteins of the influenza virus. When HA communicates with terminal sialyl residues of oligosaccharides that ensure the binding of the virion to the cell surface. To eliminate sialyl residues from oligosaccharides contained in cell and virus components, NA is also needed. It is a receptor-destroying enzyme that prevents aggregation of virus particles [7, 25].
In this paper, our goal is to identify differentially expressed glycan. The clustering algorithms have been applied to H1N1 infected human datasets and non-infected human data-set. After that, we compare infected with the non-infected dataset and identify differentially expressed glycan.
2.2 Proposed Methodology
Input: Let, the dataset D consists of “n” number of glycan with “m” number of parameter values like RFU (relative fluorescence units), STDEV (standard deviation), and SEM (squared error mean). Each glycan is a vector and is represented by g1, g2, g3, …, gi, …, gn. The dataset D has two states normal (represented by DN) and diseased or H1N1 infected state (represented by DI).
Output: Differentially expressed glycan identification G’
Step-1: Apply clustering algorithm “C” on normal (represented by DN) and diseased or H1N1 infected state (represented by DI).
Step-2: Result for normal state =
Step-3: Find out the identical clusters or matched clusters between normal states to infected states.
Step-4: Perform cluster comparison and identify the differentially expressed glycan set G that has been changed quite significantly.
Step-5: For multiple glycan datasets D1, D2,…, Dt, the resultant glycan set will be represented as G’= G1∩G2…∩Gt; here, G1 is the differentially expressed glycan set obtained in Step 4 for dataset D1.
The entire methodology has been depicted in Figure 2.1. In this paper, three clustering algorithms are used:
The first algorithm has been applied that is the k-means clustering and was proposed by scientist J.B. Macqueen. The actual idea behind this algorithm is to identify k centroids one for each cluster or group.
(1) At first, choose some points to represent initial cluster focal points.
(2) Secondly, assign each object to a cluster that has closed centroids.
(3) Thirdly, when all objects are assigned, then recalculate the position of the k centroids, and lastly, this process will be continued until the centroids no longer move and this basically produces separation of the objects into clusters from which the metric is to be minimized can be calculated [23].
The hierarchical clustering is the second algorithm. It groups similar objects into groups (cluster). In this algorithm, it basically treats every observation as an individual cluster. After that, it iterates the following steps continuously:
(1) At first, consider the two clusters or groups that are closest together.
(2) Then, combine the two most similar clusters. Until all the clusters are combined together, this process continues [24].
The fuzzy c-means clustering is the last and third algorithm. This algorithm’s concept is very like to the k-means clustering. The algorithm is as follows:
(1) At first, identify clusters number.
(2) Then, randomly assign coefficients to each data point for being in the clusters.
(3) Until the algorithm has converged, repeats (1) and (2) step:(i) Compute centroid of each cluster or group.(ii) For every data point, compute the coefficient of being in the cluster.
2.3 Result
Result section consists of description of datasets, analysis of results, and validation of results.