Biomedical Data Mining for Information Retrieval. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Biomedical Data Mining for Information Retrieval - Группа авторов страница 23

Biomedical Data Mining for Information Retrieval - Группа авторов

Скачать книгу

(ADMET) profile of a compound is required to predict the bioavailability, biodegradability and toxicity of drugs. Initially, the simple descriptor-based statistical models were created for predicting the bioactivity of drug compounds. Later on, the target specificity and selectivity of compounds were increased many folds due to the inclusion of machine learning-based models [100]. The machine learning classifiers may be built and trained based on preexisting knowledge of either molecular descriptors or substructure mining in order to classify new compounds.

      Besides expert systems, there are also some other automated prediction methods like Bayesian methods, Neural Networks, Support Vector Machines. Bayesian Inference Networks (BIN) is among one of the crucial methods that allow a straightforward representation of uncertainties that are involved in the different medical domains involving diagnosis, treatment selection, prediction of prognosis and screening of compounds [105]. Nowadays, doctors are using these BIN models in the prognosis and diagnosis. Use of BIN models in the ligand-based virtual screening domain tells their successful implications in the field of drug discovery. A comparative study was done to find the efficiency of three models: Tanimoto Coefficient Networks (TAN), conventional BINs and BIN Reweighting Factor (BINRF) for screening billions of drug compounds based on structural similarity information [106]. All three models use MDL Drug Data Report (MMDR) database for training as well as testing purposes. The ligand-based virtual screening, which utilizes the BINRF model, not only significantly improved the search strategy, it also identified the active molecules with less structural similarity, compared to TAN and BIN-based approaches. Thus, this is an era of the integrative approaches to achieve higher accuracy in drug or drug target prediction.

      Support Vector Machine (SVM) is a supervised machine learning technique most often used in knowledge base drug designing [108]. The selection of appropriate kernel function and optimum parameters are the most challenging part in the problem modelling, as both parameters are problem-dependent. Later on, a more specific kernel function is designed that can control the complexity of subtrees by using parameter adjustments. The SVM model integrated with the newly designed kernel function successfully classifies and cross-validates small molecules having anti-cancer properties [109]. Graph kernels-based learning algorithms are widely in SVMs, and they can directly utilise graph information to classify compounds. The graph kernel-based SVMs are employed to classify diverse compounds, to predict their biological activity and to rank them in screening assays. Deep learning algorithms that mimic the human neural system, artificial neural network (ANN) also have applications in the drug discovery process. The comparable robustness of both SVM and ANN algorithms were checked in term of their ability to classify between drug/non-drug compounds [110]. The result is in support of SVM as it can classify the compounds with higher accuracy and robustness compared to ANN.

      Other machine learning algorithms: Decision tree, Random forest, logistic regression, recursive partitioning are also successfully applied to classify compounds using relationship criteria between their chemical structure and toxicity profiles [111]. The comparative study of ML algorithms shows that non-linear/ensemble-based classification algorithms are more successful in classifying the compounds using ADMET properties. Random Forest algorithms can also be used in ligand pose prediction, finding receptor-ligand interactions and predicting the efficiency of docking simulations [112]. Nowadays, Deep Learning (DL) methods are achieving remarkable success in the area of pharmaceutical research starting from biological-image analysis, de novo molecule design, ligand– receptor interaction to biological activity prediction [113]. So the continuous improvements in machine learning and deep learning algorithms will help to achieve desired results with higher prediction accuracy in the drug designing field.

      Later on, several substructure mining algorithms have been developed to accommodate the needs of an ever-changing drug discovery process [117]. The subgraph mining approach is unique as it is free from any kind of arbitrary assumption, compared to other approaches. In other words, the current subgraph mining techniques are capable of retrieving all frequent occurring subgraphs from a given database of chemical compounds in significantly less time with minimum support [118]. Furthermore, as described above, the idea behind these techniques is to enable us to find the most significant subgraph out of all possible subgraphs. Shortly, the use of Artificial intelligence-based techniques in medicinal chemistry will become more complex, due to the increasing availability of huge repositories containing chemical, biological, genetic, and structural data. The implementation of the complex algorithm on ever-increasing data volume for searching a new, safer and more effective drug candidates leads to the use of quantum computing and high-performance computing. In summary, we believe that these techniques will become a much more significant part of drug discovery endeavours within a very short time.

Скачать книгу