Biomedical Data Mining for Information Retrieval. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Biomedical Data Mining for Information Retrieval - Группа авторов страница 21
Figure 2.1 The different level of organization of protein.
Most of the models are inaccurate and do not produce predicted proteins that contain useful information so using artificial intelligence, programs are trained using many numerically represented atomic features from the models (such as bond lengths, bond angles, residue-residue interactions, physio-chemical properties, and potential energy properties). Then the comparison of the prediction models output to the known crystal structures helps to assess the quality of the model and find the most accurate model. Models for predictions and prediction analysis are compared each year in one main gathering called the Critical Assessment of Structure Prediction (CASP). Every two years researchers from around the world submit machine learning methods designed for protein structure prediction [50] where the latest advancement has been the help of protein contact distance prediction [51] and addition of quality assessment (QA) category in CASP7 (2006) [51, 52].
AI which is time and resource efficient allows for more accurate prognosis and diagnosis of structures because the computers can analyze data and have perfect calculations and deeply analyze the details. These accuracies while may be very close to that of traditional approaches are still slightly stronger allowing confidence in the results. AI would also help in cost reduction and would not be an agent to replace researchers but rather working in conjunction with them. Artificial Intelligence is an exciting field which offers solutions to issues in finding structures of proteins which is crucial to drug development and the understanding of biochemical effects. A protein’s function is determined by its structure [53–56] as the evidence is there in many biochemical reactions, therefore elucidating a protein’s structure as seen in Table 2.1 is key to understanding its function. Function determination in turn is essential for any related biological, biotechnological, medical, or pharmaceutical applications which is much needed in today’s time of increased anti-microbial resistance and threat by unknown biological agents.
Table 2.1 Summary of database sources of protein structure classification.
Database sources | Websites | References |
---|---|---|
PDB | http://www.rcsb.org/pdb/ | [57] |
UniProt | http://www.uniprot.org/ | [58] |
DSSP | http://swift.cmbi.ru.nl/gv/dssp/ | [59] |
SCOP | http://scop.mrc-lmb.cam.ac.uk/ | [60] |
SCOP2 | http://scop2.mrc-lmb.cam.ac.uk/ | [61] |
CATH | http://www.cathdb.info/ | [62] |
The various predictive models for protein structure prediction are hidden Markov models, neural networks, support vector machines, Bayesian methods, and clustering methods.
Hidden Markov Model for Prediction HMMs are among the most important techniques for protein fold recognition. In the HMM version of profile–profile methods, the HMM for the query is aligned with the prebuilt HMMs of the template library. This form of profile–profile alignment is also computed using standard dynamic programming methods. Earlier HMM approaches, such as SAM [63] and HMMer [64], built an HMM for a query with its homologous sequences and then used this HMM to score sequences with known structures in the PDB using the Viterbi algorithm, an instance of dynamic programming methods. This can be viewed as a form of profile-sequence alignment. More recently, profile–profile methods have been shown to significantly improve the sensitivity of fold recognition over profile–sequence, or sequence–sequence, methods [65].
Neural Networks (NNs) It is very challenging to determine the structure of a protein if its sequence is given and hence making function determination more difficult. Since a lot of molecular interaction and various levels of folding are involved in a functional protein simple input of sequence will not result in desired output. Deep learning methods are rapidly evolving field in the context of complex relationships between input features and desired outputs which has been put to great use in structure prediction. Various deep neural network architectures resembling the neural network of a human have been proposed which includes deep feed-forward neural networks, recurrent neural networks and neural Turing machines and memory networks. Such advancements are making this field more competitive and accurate and a comparison can be made to a human brain where it receives so many information as inputs but is able to analyze and come to a logical conclusion.
Pattern recognition and classification are important tools of NN. Examples of early NN methods that are still widely used today are PHD [66, 67] PSIPRED [68] and JPred [69] though advancement has occurred to a great deal as Deep neural network (DNN) models have been shown have an advantage of performance in image and language based problems [70] and has been seen to extend to some specific CASP areas such as residue-residue contact prediction and direct use for accurate tertiary structure generation [71–75].
Support Vector Machines (SVMs) Support Vector Machine (SVM) is a supervised Machine Learning technique that has been used to rank protein models [76]. SVM has been put to use in pattern classification problems related to biology. Support Vector Machine method is performed based on the database derived from SCOP, in which protein domains are classified which is based on
1 Known structures of protein in the data bank
2 Evolutionary relationships of the predicted protein
3 The various principles of bond formation governing the 3-D structure of protein.
The advantages of SVM include avoidance of over-fitting very effectively which is a disadvantage with several other methods and is able to manage large feature spaces, and condensation of large amount of information data.
Bayesian