Computational Prediction of Protein Complexes from Protein Interaction Networks. Sriganesh Srihari

Чтение книги онлайн.

Читать онлайн книгу Computational Prediction of Protein Complexes from Protein Interaction Networks - Sriganesh Srihari страница 11

Автор:
Жанр:
Серия:
Издательство:
Computational Prediction of Protein Complexes from Protein Interaction Networks - Sriganesh Srihari ACM Books

Скачать книгу

constitute the local (immediate) neighborhood subnetwork of v.

      PPI networks, like most real-world networks, have characteristic topological properties which are distinct from that of random networks. But, to understand this distinction we need to first understand what are random networks. Traditionally, random networks have been described using the Erdös-Rényi (ER) model, in which G(n, p) is a random network with |V| = n nodes and each possible edge connecting pairs of these nodes has probability p of existing [Erdös 1960, Bollobás 1985]. The expected number of edges in the network is Image, and the expected mean degree is np. Alternatively, a random network is defined as a network chosen uniformly at random from the collection Image of all possible networks with n nodes and m edges. If p is the probability for the existence of an edge, the probability for each network in this collection is,

Image

      where the closer p gets to 1, the more skewed the collection is toward networks with higher number of edges. When p = 1/2, the collection contains all possible Image networks with equal probability, Image.

      A marked feature of PPI networks that makes them non-random is the extremely broad distribution of degrees of proteins: While a majority of proteins have small number of immediate binding partners, there exist some proteins, referred to as “hubs,” with unusually large number of binding partners. Moreover, the degrees of most proteins are larger than the average node degree of the network. This degree distribution P(k) can be approximated by a power law P(k) ∼ k−γ, where k is the node degree and γ is a small constant. Such networks are called scale-free networks [Barabási 1999, Albert and Barabási 2002].

      Proteins within PPI networks exhibit an inherent tendency to cluster, which can be quantified by the local clustering coefficient for these proteins [Watts and Strogatz 1998]. For a selected protein v that is connected to |N(v)| other nodes, if these immediate neighbors were part of a clique (a complete subgraph), there will be Image interactions between them. The ratio of the actual number of interactions that exist between these nodes and the total number of possible interactions gives the local clustering coefficient CC(v) for the protein v:

Image

      The average clustering coefficient of the entire network is therefore given by

Image

      This clustering property gives rise to groups (subnetworks) of proteins in the PPI network that exhibit dense interactions within the groups, but sparse or less-dense interactions between the groups. The interactions between the groups occur via a few “central” proteins through which pass most paths in the network. The average shortest path length Image is given by

Image

      where (u, v) is the length of the shortest path between two connected nodes u and v. This average shortest path length of the PPI network is significantly shorter than that of an ER random network; in an ER network, the average shortest path length is proportional to ln n. However, both PPI networks as well as ER networks fall under “small-world” networks, because the average path length between any two nodes is still significantly smaller than the network size: n >> k >> ln n >> 1, where k is the average node degree. The average clustering coefficient of a PPI network is significantly higher than a random network constructed on the same protein set and with the same average shortest path length.

      This small-world property can also be quantified using two coefficients: closeness and betweenness [Hormozdiari et al. 2007]. The closeness of v is defined based on its average shortest path length to all other proteins reachable from v (that is, within the same connected component as v) in the network,

Image

      where R(G, v) is the set of proteins reachable from v. Closeness is thus the inverse of the average distance of v to all other nodes in the network. The betweenness of the node v measures the extent to which v lies “between” any pair of proteins connected to v in the network. Let Sxy be the number of shortest paths between all pairs x, yR(G, v), and let Sxy(v) be the number of these shortest paths that pass through v. The betweenness Bet(v) of v is defined as:

Image

      The distributions of closeness and betweenness coefficients for PPI networks are significantly different from that of random networks [Hormozdiari et al. 2007]. In particular, PPI networks contain central proteins which have high betweenness and these connect and hold together different groups of proteins or regions of the network.

      Studying the topological properties of PPI networks has gained considerable attention in the last several years, and in particular various network models have been proposed to describe PPI networks. As new PPI data became available, these theoretical models have been improved to accurately model the data. These include the earliest models such as the Erdös-Rényi model [Erdös 1960, Bollobás 1985], and more recent ones such as the Barabási-Albert [Barabási 1999, Albert and Barabási 2002], Watts-Strogatz [Watts and Strogatz 1998], and hierarchical [Ravasz and Barabási 2003] network models.

      With a fixed probability for each edge, the Erdös-Rényi (ER) networks (discussed above) look to be intuitive and seemingly correct, and were once commonly used to model real-world networks. However, in reality, ER networks are rarely found in real-world examples. Most real-world networks—including road and airline routes, social contacts, webpage links, and also PPI networks—do not have evenly distributed degrees. Moreover, although ER networks have the small-world property, they have almost no clustering effects. For these reasons, using ER networks as null models for comparison against real-world networks including PPI networks is usually inappropriate.

      In the Barabási-Albert (BA) model [Barabási 1999, Albert and Barabási 2002], nodes are added one at a time to simulate network growth. The probability of an edge forming between an incoming new node and existing nodes is based on the principle of preferential attachment—that is, existing nodes with higher degrees have a higher likelihood of forming an edge with the incoming node. Hence, the degree distribution follows a power-law distribution (scale-free) P(k) ∼ k−γ and exhibits small-world behavior, but like ER, lacks modular organization (low clustering behavior). The BA model is particularly important as one of the earliest instances of network models proposed as a suitable null model, instead of ER, for comparisons against biological networks [Barabási 1999, Albert and Barabási 2002]. This led to a panoply of works describing the properties of various biological network types including metabolic [Wagner and Fell 2001] and PPI networks [Yook et al. 2004] using BA. At the same time, the deficiencies of the

Скачать книгу