Biomedical Data Mining for Information Retrieval. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Biomedical Data Mining for Information Retrieval - Группа авторов страница 22
The knob-socket model of protein packing in secondary structure forms the basis of Bayesian model. As it is known that when packaging of protein may result in residues that are packed close in space but distant in sequence if the primary structure is seen [77, 78] which is not taken into account by several other methods. The Bayesian model method considers the packing influence of residues on the secondary structure determination. Thus this method has an advantage over other methods of having constructs for the direct inclusion and prediction of the secondary states of coil and turn. Where other secondary structure prediction methods are indirect and do not make direct prediction of coil structure of alpha helix and beta sheet. The secondary folding is very much dependent upon the surrounding environment (aqueous/non aqueous) as a lot of hydrogen bonding and hydrophobic is involved. Thus this method helps in developing the understanding of the environment responsible for secondary structure formation.
Clustering Methods A protein rarely performs its function in isolation, various kinds of interaction is needed to perform its function [79] as discussed earlier in this chapter in context to quaternary structure. Protein–protein interactions are thus fundamental to almost all biological processes [80] and it’s really important to understand this phenomenon. Increasing availability of large-scale protein-protein interaction data has made it possible to understand the basic components and organization of cell machinery from the network level in terms of interactions taking place. Protein–protein interactions can be studied by advance high-throughput technologies such as yeast-two-hybrid, mass spectrometry, and protein chip technologies and making available huge data sets of such interactions [81] which can be put to great use in structure prediction. In computation analysis such protein– protein interaction data can be naturally represented in the form of networks. This network representation can provide the initial global picture of protein interactions on a genomic scale and can also help to build an understanding of the basic components and organization of cell machinery. In Clustering method protein interaction network is represented as an interaction graph. In this graphical representation the proteins are as vertices (or nodes) and interactions as edges. This method has been put to use in the study of surface or topological properties of protein interaction including the network diameter, the distribution of vertex degree, the clustering coefficient and shows that there is scale-free network [82–85] and effects in a very small area [86, 87]. It has been observed and shown that clustering protein interaction networks is an effective approach for system biology to understand the relationship between the organization of a network and its function [88] making it a very effective tool.
The proteins are grouped into sets (clusters) helping to demonstrate greater similarity among proteins in the same cluster than in different clusters. The clusters have two which are protein complexes and functional modules. Protein complexes are groups of proteins that interact with each other at the same time and place which form a single multimolecular structure as evident in RNA splicing and polyadenylation machinery, protein export and transport complexes to name a few [89]. The difference between protein complex and functional modules is that the functional module consists of proteins binding each other at a different time and place and participating in a cellular process. Example of functional module includes the yeast pheromone response pathway, MAP signalling cascades, etc. [90] which initiates with an extracellular signaling leading to a signal cascade pathway resulting in gene activation and other processes.
2.7 Role of Artificial Intelligence in Computer-Aided Drug Design
High throughput screening (HTS) is a set of techniques that are capable of identifying biologically active molecules with desired properties from any compound database of billions of compounds. The prediction and identification of active compounds with high accuracy and activity are crucial to decrease the time taken to discover potent drugs. Different medicinal chemistry-related companies use screening techniques to identify active compounds from drug databases in a significantly less amount of time. The decrease in search space or targeted search will reduce the overall cost of the drug discovery process. The critical problem is how to establish a relationship between the 3D structure of the lead molecule and its biological activity. QSAR is a technique that can able to predict the activity of a set of compounds using the derived equations from a set of known compounds [91]. While in QSPR (quantitative structure–property relationships), one predicts biological activity, using the physicochemical properties of known compounds as a response variable. Accurate prediction of the activity of chemical molecules is still a persistence issue in drug discovery. It is a general phenomenon in structural bioinformatics that if the two protein structures share structural similarities, then their functions may also be the same. Nevertheless, this is not always true in the case of chemical structures, where minute structural differences in pairs of compounds will lead to change in their activity against the same target receptor. This is an activity cliff problem which is being a hot topic of debate among computational and medicinal scientists [92, 93].
The lock-and-key hypothesis and induced fit model hypothesis deal with the biochemistry of binding of a ligand at the receptor. In general, a ligand–receptor complex comprises of a smaller ligand which attaches to the functional cavity of the receptor. The 3D structure information of both ligand, as well as receptor, is essential in order to understand their functional role. There is a change in 3D conformation of receptor protein upon binding of ligands at the active site and thus leads to change in their functional state. X-Ray Crystallography, Nuclear Magnetic Resonance (NMR), Electron Microscopy are the currently available experimental techniques to predict the 3D structure of proteins. Since there is a considerable gap between available protein sequences and their 3D structures, one can harness bioinformatics techniques, namely molecular modeling, to predict their 3D structures in a less amount of time with comparable accuracy. Molecular docking is a technique that can be used to predict the binding mode of ligand at the receptor if their 3D information is available. It is the most commonly used for pose prediction of ligand at the active site of the receptor. The approach of identifying lead compounds using 3D structure information of receptor–protein is known as Structure-Based Drug Design (SBDD). Nowadays, the process of identifying, predicting and optimising the activity of small molecules against a biological target comes under SBDD domain [94–96].
Ligand-based drug design (LBDD) is another approach of drug designing, applicable only when 3D structural information of the receptor is unavailable. LBDD mainly relies on the pre-existing knowledge of compounds that are known to bind with the receptor. The physicochemical properties of known ligands are used to predict their activity and develop SAR to screen unknown compounds [97]. Although artificial intelligence can be applied in both SBDD and LBDD approaches to automate the drug discovery process, its implementation in the LBBD approaches is more common these days. Some recent methods like proteochemometric modeling (PCM) try to extract the individual descriptor information from both ligands as well as the receptors, and also the combined interaction information [98]. The machine learning classifiers use the individual descriptor, as well as cross-descriptor information, for predicting the bioactivity relations.
Biological activity is a broad term that relates to the ability of a compound/target to achieve the desired effect [99]. The bioactivity or biological activity may be divided into the activity of receptor (functionality) and activity of compounds. While in pharmacology, the biological activity is replaced by pharmacological activity, which usually represents the beneficial or adverse effect of drugs on biological systems. The compound must possess both the activity against the target as well as permissible physicochemical properties in order to establish them as an ideal drug candidate. The absorption, distribution, metabolism, excretion and