Computational Prediction of Protein Complexes from Protein Interaction Networks. Sriganesh Srihari

Чтение книги онлайн.

Читать онлайн книгу Computational Prediction of Protein Complexes from Protein Interaction Networks - Sriganesh Srihari страница 6

Автор:
Жанр:
Серия:
Издательство:
Computational Prediction of Protein Complexes from Protein Interaction Networks - Sriganesh Srihari ACM Books

Скачать книгу

TAP/MS depends on the baits used—there is no way to identify all possible complexes unless all possible baits are tested. Proteins which do not interact directly with the chosen bait but interact with one or more of the preys, might also get pulled down as part of the purified complex. In some cases, these proteins are indeed part of the real complex whereas in other cases these proteins are not (i.e., they are contaminants); therefore multiple purifications are required, possibly with each protein as a bait and as a prey, to identify the correct set of proteins within the complex. The TAP procedure therefore offers two successive affinity purifications so that the chance of retained contaminants reduces significantly. Conversely, a chosen bait might form a real complex with a set of proteins without actually interacting directly with every protein from the set, and therefore some proteins might not get pulled down as part of the purified complex. In these cases, multiple baits would need to be tested to assemble the complete complex. Moreover, since some proteins participate in more than one complex, multiple independent purifications are required to identify all hosting complexes for these proteins.

      Binary interactions between the proteins in a pulled-down protein complex are inferred using two models: matrix and spoke. In the matrix model, a binary interaction is inferred between every pair of proteins within the complex, whereas in the spoke model interactions are inferred only between the bait and all its preys. Since all pairs of proteins within a complex do not necessarily interact, the matrix model is usually an overestimation of the total number of binary interactions, whereas the spoke model is an underestimation. Therefore, usually a balance is struck between the two models that is close enough to the estimated total number of interactions for the species or organism.

Organism No. of Interactions No. of Proteins
A. thaliana 34,320 9,240
C. elegans 5,783 3,269
D. rerio 188 181
D. melanogaster 36,741 8,071
E. coli 99 104
H. sapiens 230,843 20,006
M. musculus 18,465 8,611
R. norvegicus 4,537 3,328
S. cerevisiae 82,327 6,278
S. pombe 9,492 2,944
X. laevis 532 471

      Based on BioGrid version 3.4.130 (November 2015) [Stark et al. 2011, Chatr-Aryamontri et al. 2015].

      Despite differences in procedures and technologies, the use of different experimental protocols can effectively complement one another in detecting interactions. While TAP can be more specific and detect mainly stable (co-complexed) protein interactions, Y2H can be more exhaustive and detect even transient and between-complex interactions. Based on BioGrid version 3.4.130 (November 2015) (http://thebiogrid.org/) [Stark et al. 2011, Chatr-Aryamontri et al. 2015], the numbers of mapped physical interactions range from 99 in E. coli to ~82,300 in S. cerevisiae and ~230,900 in H. sapiens (summarized in Table 1.2). It remains to be seen how many of these interactions actually occur in the physiological contexts of living cells or cell types, how many are subject to genetic and physiological variations, and how many still remain to be mapped.

      The binary interactions inferred from the different experiments are assembled into a protein-protein interaction network, or simply, PPI network. The PPI network presents a global or “systems” view of the interactome, and provides a mathematical (topological) framework to analyze these interactions. Protein complexes are expected to be embedded as modular structures within the PPI network [Hartwell et al. 1999, Spirin and Mirny 2003]. Topologically, this modularity refers to densely connected subsets of proteins separated by less-dense regions in the network [Newman 2004, Newman 2010]. Biologically, this modularity represents division of labor among the complexes, and provides robustness against disruptions to the network from internal (e.g., mutations) and external (e.g., chemical attacks) agents. Computational methods developed to identify protein complexes therefore mine for modular subnetworks in the PPI network. While this strategy appears reasonable in general, limitations in PPI datasets, arising due to the shortcomings highlighted above in experimental protocols, severely restrict the feasibility of accurately predicting complexes from the network. Specifically, the limitations in existing PPI datasets that directly impact protein complex prediction include:

      1. presence of a large number of spurious (noisy) interactions;

      2. relative paucity of interactions between “complexed” proteins; and

      3. missing contextual—e.g., temporal and spatial—information about the interactions.

      These limitations translate to the following three main challenges currently faced by computational methods for protein complex prediction:

      1. difficulty in detecting sparse complexes;

      2. difficulty in detecting small (containing fewer than four proteins) and sub-complexes; and

      3. difficulty in deconvoluting overlapping complexes (i.e., complexes that share many proteins), especially when these complexes occur under different cellular contexts.

      While the interactome coverage can be improved by integrating multiple PPI datasets, the lack of agreement between the datasets from different experimental protocols [Von Mering et al. 2002, Bader et al. 2004], and the multifold increase in accompanying noise (spurious interactions), tend to cancel out the advantage gained from the increased coverage. Consequently, the confidence of each interaction has to be assessed (confidence scoring) and low-confidence interactions have to be first removed from the datasets (filtering) before performing any downstream analysis. To summarize, computational identification of protein complexes from interaction datasets follows these steps (Figure 1.1):

      1. integrating interactions from multiple experiments and stringently assessing the confidence (reliability) of these interactions;

      2. constructing a reliable PPI network using only the high-confidence interactions;

      Figure 1.1 Identification of protein complexes from protein interaction data. (a) A high-confidence PPI network is assembled from physical interactions between proteins after

Скачать книгу