Computational Prediction of Protein Complexes from Protein Interaction Networks. Sriganesh Srihari

Чтение книги онлайн.

Читать онлайн книгу Computational Prediction of Protein Complexes from Protein Interaction Networks - Sriganesh Srihari страница 19

Автор:
Жанр:
Серия:
Издательство:
Computational Prediction of Protein Complexes from Protein Interaction Networks - Sriganesh Srihari ACM Books

Скачать книгу

DatabaseSourceReference
ComPPIhttp://comppi.linkgroup.hu/[Veres et al. 2015]
GeneMANIAhttp://www.genemania.org/[Warde-Farley et al. 2010]
HIPPIEhttp://cbdm.mdc-berlin.de/tools/hippie/[Schaefer et al. 2012]
HitPredicthttp://hintdb.hgc.jp/htp/[Patil et al. 2011]
HumanNethttp://www.functionalnet.org/[Lee et al. 2011]
I2D/OPHIDhttp://ophid.utoronto.ca/ophidv2.204/[Brown and Jurisica 2005, Brown and Jurisca 2007, Kotlyar et al. 2015]
InnateDBhttp://www.innatedb.com/[Lynn et al. 2008]
IntScorehttp://intscore.molgen.mpg.de/[Kamburov et al. 2012]
InWebhttp://www.lagelab.org/resources/[Li et al. 2017]
iRefIndexhttp://irefindex.org/wiki/index.php?title=iRefIndex[Razick et al. 2008, Turner et al. 2010]
MyProteinNethttp://netbio.bgu.ac.il/myproteinnet/[Basha et al. 2015]
MatrixDBhttp://matrixdb.univ-lyon1.fr/[Chautard et al. 2011]
IID/OPHIDhttp://ophid.utoronto.ca/iid/[Brown and Jurisica 2005, Kotlyar et al. 2015, Kotlyar et al. 2016]
PrePPIhttp://bhapp.c2b2.columbia.edu/PrePPI/[Zhang et al. 2012, Zhang et al. 2013]
PSICQUIChttp://psicquic.googlecode.com/[Aranda et al. 2011]
STRINGhttp://string-db.org/[Von Mering et al. 2003, Szklarczyk et al. 2011]
UniHIhttp://www.unihi.org/[Kalathur et al. 2014]

       Computational Prediction of Protein Interactions

      Although high-throughput techniques produce large amounts of data, the covered fraction of the interactomes from most organisms are far from complete [Cusick et al. 2009, Hart et al. 2006, Huang et al. 2007]. For example, while ∼70% of the interactomes from model organisms including S. cerevisiae have been mapped, these interactomes still lack interactions among membrane proteins [Von Mering et al. 2002, Hart et al. 2006, Huang et al. 2007]. Likewise, estimates show that less than 50% of the interactomes from higher-order organisms including human (∼10%) and other mammals have been mapped [Hart et al. 2006, Stumpf et al. 2008, Vidal 2016]. Computational prediction of interactions could partially compensate for this lack of coverage by predicting interactions between proteins in network regions with low coverage. Here, we only present a brief conceptual overview of computational methods developed for protein interaction prediction; for methodological details and for a comprehensive list of these methods, the readers are referred to excellent surveys by Valencia and Pazos [2002], Obenauer and Yaffe [2004], Zahiri et al. [2013], Ehrenberger et al. [2015], and Keskin et al. [2016].

      Gene Neighbors. A commonly used approach to predict protein interactions in prokaryotes is by using co-transcribed or co-regulated sets of genes. It is based on the observation that, in prokaryotes, proteins encoded by genes that are transcribed or regulated as single units—e.g., as operons—are often involved in similar functions and tend to physically interact. Computational methods exist to predict operons in bacterial genomes using intergenic distances [Ermolaeva et al. 2011, Price et al. 2005]. Analysis of gene-order conservation in bacterial and archaeal genomes shows that protein products of 63–75% of operonic genes physically interact [Dandekar et al. 1998]. In eukaryotes, evidence from yeast and worm [Teichmann and Babu 2002, Snel et al. 2004] shows that co-regulated sets of genes encode proteins that are functionally similar and these proteins are highly likely to interact. These studies therefore provide the basis to predict new interactions between proteins using sets of co-transcribed and co-regulated sets of genes [Huynen et al. 2000, Bowers et al. 2004].

      Phylogenetic Profiles. Similar phylogenetic profiles between proteins provide strong evidence for protein interactions [Pellegrini et al. 1999, Galperin and Koonin 2000, Pellegrini 2012]. For a given protein, a phylogenetic profile is constructed as a vector of N elements, where N is the number of genomes (species). The presence or absence of the protein in a genome is indicated as 1 or 0 at the corresponding position in the phylogenetic profile. Phylogenetic profiles of a collection of proteins can be clustered using a bit-distance measure, to generate clusters of proteins that co-evolve. Therefore, proteins appearing in the same cluster are considered to be evolutionarily co-evolving and these proteins are inferred to be functionally related and physically interacting. This inference is based on the hypothesis that interacting sets of non-homologous proteins that co-evolve are under evolutionary pressure to conserve their interactions and to maintain their co-functioning ability [Shoemaker and Panchenko 2007, Sun et al. 2005].

      Co-Evolution of Interacting Proteins. Interacting proteins often co-evolve so that changes in one protein in a pair leading to the loss of function or interaction should be compensated by correlated changes in the other protein [Shoemaker and Panchenko 2007]. This co-evolution is reflected by the similarity between the phylogenetic protein trees (or simply, protein trees) of non-homologous interacting protein families. A protein tree represents the evolutionary history of protein families, i.e., proteins or protein families that diverged from a common ancestor. These protein trees reconciled with their species trees have their internal nodes annotated to speciation and duplication events [Vilella et al. 2009]. TreeSoft (http://treesoft.sourceforge.net/treebest.shtml) provides a suite of tools to build and visualize protein trees. The similarity between two protein trees can be computed by aligning the corresponding distance matrices so as to minimize the difference between the matrix elements: the smaller the difference between the matrices, the stronger the co-evolution between the two protein families. Interactions are predicted between proteins corresponding to the aligned columns of the two matrices. The similarity between two protein trees is influenced by the speciation process and, therefore, there is a certain background similarity between any two protein trees, irrespective of whether the proteins interact or not. Statistical approaches exist to correct for these factors (phylogenetic subtraction) [Harvey and Pagel 1991, Harvey et al. 1995]. It is also worth noting that a protein can have multiple partners, and so taking into consideration its co-evolution with all its partners further enhances the accuracy of the interaction prediction [Juan et al. 2008].

      Gene Fusion. Gene fusion is a common event in evolution, wherein two or more genes in one species fuse into a single gene in another species. Gene fusion is a result of duplication, translocation, or inversion events that affect coding sequences during the evolution of genomes. Therefore, gene fusions play an important role in determining the gene (and genomic) architecture of species. Gene fusions may occur to optimize co-transcription of genes involved in the fusion: by fusing two or more genes, it may be easier to transcribe these genes as a single entity, thus resulting in a single protein product. Typically, proteins coded by these fused genes in a species carry multiple functional domains, which originate from different proteins (genes) in the ancestor species. Therefore, one may infer interactions between these individual proteins in the ancestor species: it is likely that these proteins are partners in performing a particular function

Скачать книгу