Computational Prediction of Protein Complexes from Protein Interaction Networks. Sriganesh Srihari
Чтение книги онлайн.
Читать онлайн книгу Computational Prediction of Protein Complexes from Protein Interaction Networks - Sriganesh Srihari страница 13
Figure 2.4 PPI network visualization using the Cytoscape 3.4.0 tool [Shannon et al. 2003, Smoot et al. 2010]. A portion of the human PPI network (1,977 proteins and 5,679 interactions) downloaded from BioGrid [Stark et al. 2011] is visualized here using force-directed layout. Basic statistics—average number of neighbors, network diameter, etc.—are displayed for the network. Proteins (e.g., BRCA1), protein complexes (e.g., eukaryotic initiation factor 4F and nuclear pore complexes), pathways (e.g., Fanconi anaemia pathway), and cellular processes (e.g., DNA-damage repair and chromatin remodeling) are “pulled-out” and highlighted. Cytoscape provides “link-out” to external databases and tools—e.g., KEGG [Kanehisa and Goto 2000]—to enable further analysis.
Once the PPI network is laid out, a good visualization tool should allow at least some basic visual analysis of the network. The following aspects become important here (see Figure 2.4). The ease of navigation through the PPI network to explore individual proteins and interactions is of prime importance. In particular, the tool should be able to load and enable nagivation of even large networks. Next is the provision to annotate the network using internal (e.g., labeling nodes by serial numbers or by their network properties) or external information (see below). The tool should also be able to compute (basic) topological properties of the network—for example, node degree, shortest path lengths, and clustering, closeness, and betweenness coefficients. These statistics help users get at least a preliminary idea of the network. Another valuable feature of a good tool is linkout to external databases, for example to PubMed literature (http://www.ncbi.nlm.nih.gov/pubmed), National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/), UniProt or SwissProt (http://www.uniprot.org/) [Bairoch and Apweiler 1996, UniProt 2015], BioGrid (http://thebiogrid.org/) [Stark et al. 2011], Gene Ontology (GO) (http://www.geneontology.org/) [Ashburner et al. 2000], and Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.genome.jp/kegg/pathway.html) [Kanehisa and Goto 2000]. These enable functional annotation of proteins and interactions. Finally, the tool should also possibly support advanced analyses such as clustering of the network, comparison (based on topological characteristics, for example) between networks, and enrichment analysis, e.g., using GO terms. Table 2.3 lists some of the popular tools available for PPI network visualization and (visual) analysis. OMICS Tools (http://omictools.com/network-visualization-category) maintains an exhaustive list of visualization tools for PPI and other biomolecular network analysis.
Table 2.3 Software tools for PPI network visualization and analysis
Visualization Tool | Source | Reference |
Arena3D | http://arena3d.org/ | [Pavlopoulos et al. 2011] |
AVIS | http://actin.pharm.mssm.edu/AVIS2/ | [Seth et al. 2007] |
BioLayout | http://www.biolayout.org/ | [Theocharidis et al. 2009] |
Cytoscape | http://www.cytoscape.org/download.html | [Shannon et al. 2003, Smoot et al. 2010] |
Medusa | http://coot.embl.de/medusa/ | [Hooper and Bork 2005] |
NAViGaTOR | http://ophid.utoronto.ca/navigator/download.html | [Brown et al. 2009] |
ONDEX | http://www.ondex.org/ | [Köhler et al. 2006] |
Osprey | http://biodata.mshri.on.ca/osprey/servlet/Index | [Breitkreutz et al. 2003] |
Pajek | http://vlado.fmf.uni-lj.si/pub/networks/pajek/ | [Vladimir and Andrej 2004] |
PIVOT | http://acgt.cs.tau.ac.il/pivot/ | [Orlev et al. 2004] |
ProViz | http://cbi.labri.fr/eng/proviz.htm | [Florian et al. 2005] |
2.6 Building High-Confidence PPI Networks
From our discussions on experimental protocols in earlier sections, we know that some protocols—including the AP/MS ones—offer only pulled-down complexes consisting of baits and their preys without specifying the binary interactions between these components. Therefore, binary interactions need to be specifically inferred between the bait and each of its preys within the pulled-down complexes. However, not all preys in a pulled-down complex interact directly with the bait (but, get pulled down due to their interactions with other preys in the complex). Therefore, it is necessary to infer binary interactions not just between the bait and its preys but also between the interacting preys. Yet, care should be taken to avoid inferring spurious (false-positive) interactions between the preys that do not interact. To overcome these uncertainties, often a balance is sought between two kinds of models, spoke and matrix, which are used to transform pulled-down complexes into binary interactions between the proteins [Gavin et al. 2006, Krogan et al. 2006, Spirin and Mirny 2003, Zhang et al. 2008].
The spoke model assumes that the only interactions in the complex are between the bait and its preys, like the spokes of a wheel. This model is useful to reduce the complexity of the data, but misses all (true) prey–prey interactions. On the other hand, the matrix model assumes that every pair of protein within a complex interact. This model can cover all possible true interactions, but can also predict a large number of spurious interactions. An empirical evaluation using 1,993 baits and 2,760 preys from the dataset from Gavin et al. [2006] against 13,384 pairwise protein interactions between proteins within the expert-curated MIPS complexes [Mewes et al. 2006] revealed 80.2% true-negative (missing) interactions and 39% false-positive (spurious) interactions in the spoke model, and 31.2% true-negative interactions but 308.7% false-positive interactions in the matrix model [Zhang et al. 2008]. However, note that many of the missing interactions could be due to the lack of protein coverage in these experiments. A balance is struck between the two models that covers as many true interactions between the baits and preys as possible without allowing too many false interactions [Gavin et al. 2006]; see Figure 2.5.
Gaining Confidence in PPI Networks
Although high-throughput studies have been successful in mapping large fractions of interactomes from multiple organisms, the datasets generated from these studies are not free from errors. High-throughput PPI datasets often contain a considerable number of spurious interactions, while missing a substantial number of true interactions [Von Mering et al. 2002, Bader and Hogue 2002, Cusick et al. 2009]. Consequently, a crucial challenge in adopting these datasets for downstream analysis—including protein complex prediction—is in overcoming these challenges.
Figure 2.5 Inferring protein interactions from pull-down protein complexes. Bait–prey relationships from pull-down complexes are assembled using the spokes model, where the bait is connected to each of the preys (A); or using the matrix model, where every bait–prey and prey–prey pair is connected