Bioinformatics and Medical Applications. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Bioinformatics and Medical Applications - Группа авторов страница 23

Bioinformatics and Medical Applications - Группа авторов

Скачать книгу

function, such as the representation of proteins in three-dimensional space rather than in linear representation. Others use stochastic algorithms instead of deterministic algorithms that may or may not evaluate physico-chemical properties.

      In this work, we used two main divisions: the supervised algorithms and the non-supervised algorithms, both classifications are discussed below.

      3.4.1 Supervised Algorithms

      A supervised algorithm is a particular computational code that requires calibration or training to know what to look for. This makes them programmer-dependent codes as, at a first stage, it is calibrated and, at a second stage, when they are already calibrated, they can search a particular profile.

      3.4.2 Non-Supervised Algorithms

      A non-supervised algorithm is a computational code that does not require calibration or training to know what to look for and, if it requires it, it is only a part of the code and it modifies itself to adjust the search criteria. The running of these codes does not depend on the programmer as they are independent.

      In the proteomics and genomics fields, there are also these types of algorithms and although they are less, they are very useful.

      In this chapter, we will use an algorithm of this type named Polarity Index Method®, to explore SARS-CoV-2 structural proteins.

      3.5 Polarity Index Method®

      The non-supervised algorithm named Polarity Index Method® (PIM®) is a system programmed in FORTRAN 77 and Linux. It calculates and compares the PIM® protein profile of the target group with other groups, modifying the PIM® profile of the target group to make it representative and discriminant of the other protein groups it is compared with.

      The metrics of the PIM® profile consist to evaluate the 16 charge/polarity interactions identified by reading the sequence of a protein by pairs of residues, from left to right. The PIM® system has three stages:

      1 1. The amino acid sequence is converted to the numeric charge/polarity-related annotations P+, P−, N, and NP, where P+ are H, His; K, Lys; and R, Arg; P− are D, Asp; and E, Glu; N are C, Cys; G, Gly; N, Asp; Q, Gln; S, Ser; T, Thr; and Y, Tyr; and NP are A, Ala; F, Phe; I, Ile; L, Leu; M, Met; P, Pro; V, Val; and W, Trp.

      2 2. The sequence is expressed in FASTA format; all the incidences of these pairs of amino acids are registered in a 4 × 4 algebraic matrix where its rows and columns are the four PIM® profile groups. Once all amino acid pairs are recorded, the incidence matrix is normalized.

      3 3. Create a 16-element vector putting, from left to right, the 16 possible positions from the incidence matrix in increasing or decreasing order. Two proteins are equal if their 16-element vectors are the same.

      Two proteins are equal if their 16-element vectors shared the same preponderant function.

      3.5.2 Advantages

      The main advantage of this method is that the metric acts on the linear representation of the protein and not in the three-dimensional representation of it, making possible a simple analysis. On the other hand, only one physico-chemical property is evaluated, the polarity/charge of the protein.

      The analysis is comprehensive as the full spectrum of PIM® profile incidents is examined; in other words, the PIM® profile is not a number but a 16-element vector. Thus, two proteins have the same PIM® profile if their 16-element vectors are equal.

      3.5.3 Disadvantages

      Its use as part of a biochip is not yet completed; this restricts its use only to determine the predominant function of a protein; however, nowadays, it is not enough to identify the function/structure of a protein but to identify it in the blood of an organism and determine its number.

      This will enable the PIM R profile as a rapid detection test.

      Figure 3.2 shows the PIM® profile of the spike and envelope proteins behaving particularly differently, while the membrane is the translation of nucleocapsid.

Скачать книгу