Computational Prediction of Protein Complexes from Protein Interaction Networks. Sriganesh Srihari

Чтение книги онлайн.

Читать онлайн книгу Computational Prediction of Protein Complexes from Protein Interaction Networks - Sriganesh Srihari страница 4

Автор:
Жанр:
Серия:
Издательство:
Computational Prediction of Protein Complexes from Protein Interaction Networks - Sriganesh Srihari ACM Books

Скачать книгу

is primarily concerned with the problem of protein complex prediction, the book also covers several other aspects of PPI networks. We would like to therefore dedicate this book to the students—Honors, Masters, and Ph.D. students—who worked on these different aspects of PPI networks by being part of the computational biology group at the Department of Computer Science, National University of Singapore, over the years. Several of the methods covered in this book are a result of the extensive research conducted by these students. Sriganesh would like to thank Hon Wai Leong (Professor of Computer Science, National University of Singapore) under whom he conducted his Ph.D. research on protein complex prediction; Mark Ragan (Head of Division of Genomics of Development and Disease at Institute for Molecular Bioscience, The University of Queensland) under whom he conducted his postdoctoral research, a substantial portion of which was on identifying protein complexes in diseases; and Kum Kum Khanna (Senior Principal Research Fellow and Group Leader at QIMR-Berghofer Medical Research Institute) whose guidance played a significant part in his understanding of biological aspects of protein complexes. Sriganesh is grateful to Mark for passing him an original copy of a 1977 volume of Progress in Biophysics and Molecular Biology in which G. Rickey Welch makes a consistent principled argument that “multienzyme clusters” are advantageous to the cell and organism because they enable metabolites to be channeled within the clusters and protein expression to be co-regulated [Welch 1977]—a possession which Sriganesh will deeply cherish. Chern Han would like to thank his coauthors: Sriganesh for doing the heavy lifting in writing, editing, and driving this project and Limsoon Wong for guiding him through his Ph.D. journey on protein complexes. He would also like to acknowledge the support of Bin Tean Teh (Professor with Program in Cancer and Stem Cell Biology, Duke-NUS Medical School), who currently oversees his postdoctoral research. Limsoon would like to acknowledge Chern Han and Sriganesh for doing the bulk of the writing for this book, and especially thank Sriganesh for taking the overall lead on the project. When he suggested the book to Chern Han and Sriganesh, he had not imagined that he would eventually be a co-author.

      We are indebted also to the Editor-in-Chief of ACM Books, Tamer Özsu, Executive Editor Diane Cerra, Production Manager Paul C. Anagnostopoulos, and the entire team at ACM Books and Morgan & Claypool Publishers for their encouragement and for producing this book so beautifully.

      Sriganesh Srihari

      Chern Han Yong

      Limsoon Wong

      May 2017

      1

      Introduction to Protein Complex Prediction

      Unfortunately, the proteome is much more complicated than the genome.

      —Carol Ezzell [Ezzel et al. 2002]

      In an early survey, American biochemist Bruce Alberts termed large assemblies of proteins as protein machines of cells [Alberts et al. 1998]. Protein assemblies are composed of highly specialized parts that coordinate to execute almost all of the biochemical, signaling, and functional processes in cells [Alberts et al. 1998]. It is not hard to see why protein assemblies are more advantageous to cells than individual proteins working in an uncoordinated manner. Compare, for example, the speed and elegance of the DNA replication machinery that simultaneously replicates both strands of the DNA double helix with what could ensue if each of the individual components—DNA helicases for separating the double-stranded DNA into single stands, DNA polymerases for assembling nucleotides, DNA primase for generating the primers, and the sliding clamp to hold these enzymes onto the DNA—acted in an uncoordinated manner [Alberts et al. 1998]. Although what might seem like individual parts brought together to perform arbitrary functions, protein assemblies can be very specific and enormously complicated. For example, the spliceosome is composed of 5 small nuclear RNAs (snRNAs or “snurps”) and more than 50 proteins, and is thought to catalyze an ordered sequence of more than 10 RNA rearrangements at a time as it removes an intron from an RNA transcript [Alberts et al. 1998, Baker et al. 1998]. The discovery of this intron-splicing process won Phillip A. Sharp and Richard J. Roberts the 1993 Nobel Prize in Physiology or Medicine.1

      Protein assemblies are known to be in the order of hundreds even in the simplest of eukaryotic cells. For example, more than 400 protein assemblies have been identified in the single-celled eukaryote Saccharomyces cerevisiae (budding yeast) [Pu et al. 2009]. However, our knowledge of these protein assemblies is still fragmentary, as is our conception of how each of these assemblies work together to constitute the “higher level” functional architecture of cells. A faithful attempt toward identification and characterization of all protein assemblies is therefore crucial to elucidate the functioning of the cellular machinery.

      To identify the entire complement of protein assemblies, it is important to first crack the proteome—a concept so novel that the word “proteome” first appeared only around 20 years ago [Wilkins et al. 1996, Bryson 2003, Cox and Mann 2007]. The proteome, as defined in the UniProt Knowledgebase, is the entire complement of proteins expressed or derived from protein-coding genes in an organism [Bairoch and Apweiler 1996, UniProt 2015]. With the introduction of high-throughput experimental (proteomics) techniques including mass spectrometric [Cox and Mann 2007, Aebersold and Mann 2003] and protein quantitative trait locus (QTL) technologies [Foss et al. 2007], mapping of proteins on a large scale has become feasible. Just like how genomics techniques (including genome sequencing) were first demonstrated in model organisms, proteome-mapping has progressed initially and most rapidly for model prokaryotes including Escherichia coli (bacteria) and model eukaryotes including Saccharomyces cerevisiae (budding or baker’s yeast), Drosophila melanogaster (fruit fly), Caenorhabditis elegans (a nematode), and Arabidopsis thaliana (a flowering plant). Table 1.1 summarizes the numbers of proteins or protein-coding genes identified from these organisms. Of these, the proportions of protein-coding genes that are essential (genes that are thought to be critical for the survival of the cell or organism; “fitness genes”) range from ∼2% in Drosophila to ∼6.5% in Caenorhabditis and ∼18% in Saccharomyces [Cherry et al. 2012, Chen et al. 2012]. Recent landmark studies using large-scale proteomics [Wilhelm et al. 2014, Kim et al. 2014, Uhlén et al. 2010, Uhlén et al. 2015] on Homo sapiens (human) cells have characterized >17,000 (or >90%) putative protein-coding genes from ≥40 tissues and organs in the human body. An encyclopedic resource on these proteins covering their levels of expression and abundance in different human tissues is available from the ProteomicsDB (http://www.proteomicsdb.org/) [Wilhelm et al. 2014], The Human Proteome Map (http://humanproteomemap.org/) [Kim et al. 2014], and The Human Protein Atlas (http://www.proteinatlas.org/) [Uhlén et al. 2010, Uhlén et al. 2015] projects. GeneCards (http://www.genecards.org/) [Safran et al. 2002, Safran et al. 2010] aggregates information on human protein-coding genes from >125 Web sources and presents the information in an integrative user-friendly manner. The expression levels of nearly 200 proteins that are essential for driving different human cancers are available from The Cancer Proteome Atlas (TCPA) project (http://app1.bioinformatics.mdanderson.org/tcpa/_design/basic/index.html) [Li et al. 2013], measured from more than 3,000 tissue samples across 11 cancer types studied as part of The Cancer Genome Atlas (TCGA) project (http://cancergenome.nih.gov/). Short-hairpin RNA (shRNA)-mediated knockdown [Paddison et al. 2002, Lambeth and Smith 2013], clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9-based gene editing [Sanjana et al. 2014, Baltimore et al. 2015, Shalem et al. 2015], and disruptive mutagenesis [Bökel 2008] screening using MCF-10A (near-normal mammary), MDA-MB-435 (breast cancer), KBM7 (chronic myeloid leukemia), HAP1 (haploid), A375 (melanoma), HCT116

Скачать книгу