Bioinformatics and Medical Applications. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Bioinformatics and Medical Applications - Группа авторов страница 23
In this work, we used two main divisions: the supervised algorithms and the non-supervised algorithms, both classifications are discussed below.
3.4.1 Supervised Algorithms
A supervised algorithm is a particular computational code that requires calibration or training to know what to look for. This makes them programmer-dependent codes as, at a first stage, it is calibrated and, at a second stage, when they are already calibrated, they can search a particular profile.
In the proteomics and genomics fields, there are many different algorithms designed under this assumption.
3.4.2 Non-Supervised Algorithms
A non-supervised algorithm is a computational code that does not require calibration or training to know what to look for and, if it requires it, it is only a part of the code and it modifies itself to adjust the search criteria. The running of these codes does not depend on the programmer as they are independent.
In the proteomics and genomics fields, there are also these types of algorithms and although they are less, they are very useful.
In this chapter, we will use an algorithm of this type named Polarity Index Method®, to explore SARS-CoV-2 structural proteins.
3.5 Polarity Index Method®
The non-supervised algorithm named Polarity Index Method® (PIM®) is a system programmed in FORTRAN 77 and Linux. It calculates and compares the PIM® protein profile of the target group with other groups, modifying the PIM® profile of the target group to make it representative and discriminant of the other protein groups it is compared with.
3.5.1 The PIM® Profile
The metrics of the PIM® profile consist to evaluate the 16 charge/polarity interactions identified by reading the sequence of a protein by pairs of residues, from left to right. The PIM® system has three stages:
1 1. The amino acid sequence is converted to the numeric charge/polarity-related annotations P+, P−, N, and NP, where P+ are H, His; K, Lys; and R, Arg; P− are D, Asp; and E, Glu; N are C, Cys; G, Gly; N, Asp; Q, Gln; S, Ser; T, Thr; and Y, Tyr; and NP are A, Ala; F, Phe; I, Ile; L, Leu; M, Met; P, Pro; V, Val; and W, Trp.
2 2. The sequence is expressed in FASTA format; all the incidences of these pairs of amino acids are registered in a 4 × 4 algebraic matrix where its rows and columns are the four PIM® profile groups. Once all amino acid pairs are recorded, the incidence matrix is normalized.
3 3. Create a 16-element vector putting, from left to right, the 16 possible positions from the incidence matrix in increasing or decreasing order. Two proteins are equal if their 16-element vectors are the same.
Two proteins are equal if their 16-element vectors shared the same preponderant function.
3.5.2 Advantages
The main advantage of this method is that the metric acts on the linear representation of the protein and not in the three-dimensional representation of it, making possible a simple analysis. On the other hand, only one physico-chemical property is evaluated, the polarity/charge of the protein.
The analysis is comprehensive as the full spectrum of PIM® profile incidents is examined; in other words, the PIM® profile is not a number but a 16-element vector. Thus, two proteins have the same PIM® profile if their 16-element vectors are equal.
3.5.3 Disadvantages
Its use as part of a biochip is not yet completed; this restricts its use only to determine the predominant function of a protein; however, nowadays, it is not enough to identify the function/structure of a protein but to identify it in the blood of an organism and determine its number.
This will enable the PIM R profile as a rapid detection test.
3.5.4 SARS-CoV-2 Recognition Using PIM® Profile
The PIM® system (Section 3.5.1) was determined in the four SARS-CoV-2 structural proteins: spike, membrane, envelope, and nucleocapsid (Section 3.3.3), and their smooth curves (Figure 3.1) were plotted, it was observed that there is a similarity in these PIM® profiles, except for the region between the polar interactions [P−, P−] and [N, P−] see (Figure 3.2).
Figure 3.1 Relative frequency distribution of proteins that express the four SARS-CoV-2 structural viral protein group represented by “smooth curves”. Graphs were produced using EXCEL software. The X-axis represents the 16 charge/polarity interactions. The ellipse shows the region where curves do not match the trend.
Figure 3.2 shows the PIM® profile of the spike and envelope proteins behaving particularly differently, while the membrane is the translation of nucleocapsid.
A revision of the histograms (Figure 3.3) of the relative frequency distribution of the residues in the sequences of the SARS-CoV-2 structural proteins (spike, envelope, membrane, and nucleocapsid) shows that any of them are similar. When in general, this behavior does not necessarily depend on the length of the sequence.
Figure 3.2 Zoom over the Figure 3.1. The X-axis represents the five polar interactions from [P−, P−] and [N, P−]. See (Section 2.5.4).
Figure 3.3 Histograms SARS-CoV-2 structural proteins.
3.6 Future Implications