Bacterial Pathogenesis. Brenda A. Wilson

Чтение книги онлайн.

Читать онлайн книгу Bacterial Pathogenesis - Brenda A. Wilson страница 56

Bacterial Pathogenesis - Brenda A.  Wilson

Скачать книгу

enable overall functional assessment of the microbiota in the context of the host environment.

      Before delving into the microbial populations of different parts of the body, it is worth reviewing some of the powerful new culture-independent approaches and analytical tools that are available for exploring the scope, depth, and variety of microbes that comprise mammalian microbiotas.

      Some of the first questions that arise when scientists are seeking to characterize a complex microbial population are: what microbial species and how many of each are present, how much variation in composition is there from person to person and site to site, and how does the composition change with conditions and over time? To answer these questions, it is first necessary to identify the species to which the microbes belong and to determine their phylogenetic relationship (or evolutionary similarity) to each other. Although most studies surveying the microbiota to date have focused on the bacterial content of the community, there are some researchers who are beginning to explore other microbes, such as archaea, fungi, protozoa, and viruses (including bacteriophage). For now, we will likewise focus on characterization of the bacterial content of microbiotas, but will return to the other microbes later in the chapter.

      16S rRNA Gene-Based Taxonomical Identification of Bacteria. To complete a census of the species present in a bacterial community, researchers must first perform sequence analysis of all, or at least the more abundant, species present in a sample. For this, they need to choose a gene that is common to all bacteria of interest. The most widely used approach to determining the bacterial content of a community is to isolate the total genomic DNA from the microbial population and then employ polymerase chain reaction (PCR) to specifically amplify the bacterial 16S rRNA genes (Figure 5-1), which are then sequenced. The use of 16S rRNA genes is advantageous because these genes are large enough (1,542 nucleotides for the Escherichia coli gene) to contain adequate sequence information for identification and discrimination among close relatives but small enough to be sequenced easily. The 16S rRNA gene, which is present in all bacteria, is a mosaic of regions (Figure 5-1A), some that are highly conserved among all bacterial species and some that are less conserved and consequently contain sequence signatures for different bacterial species, acquired through slow accumulation of mutations over time.

      Figure 5-1. Detection of bacteria in a clinical specimen based on 16S rRNA gene amplification by PCR. The 16S rRNA gene is used as the standard for bacterial taxonomic identification and phylogenetic relationship studies because it is highly conserved among different taxa of bacteria. (A) The 16S rRNA gene (∼1,542 nucleotides in the Escherichia coli gene shown here) has highly conserved regions of sequence that can serve as primer binding sites for PCR amplification, but also has hypervariable regions (labeled V1 through V9) that can be used as signatures for distinguishing among different bacterial taxa and establishing phylogenetic relationships. (B) For detection of bacteria in a clinical sample, PCR primers (solid dark bars) recognize conserved segments of DNA on either side of the variable region to be amplified. For the PCR reaction, a thermostable DNA polymerase that exhibits maximal catalytic activity at 75–80°C and possesses 3′ to 5′ exonuclease activity that reduces incorporation of the wrong nucleotide is used, such as the Pfu or Vent polymerase. The amplified segment (amplicon) can then be sequenced and compared to rRNA databases of known bacteria for taxonomic identification.

      The basic procedure is relatively simple. DNA primers that recognize highly conserved regions at the beginnings and ends of the bacterial 16S rRNA genes are used to amplify most of the gene, including the variable regions containing the identification signatures, by PCR using a thermostable, high-fidelity DNA polymerase, such as Vent or Pfu (Figure 5-1B). The resulting PCR products, called amplicons, are sequenced directly using new DNA sequencing technologies, described in detail later in the chapter. Using bioinformatics (computer software programs capable of handling and analyzing massive amounts of data), the sequences of the rRNA genes present in the original sample (called output sequence reads) are compared to those available in the ever-growing, publicly available DNA sequence databases (Box 5-1) to identify the nearest bacterial relatives and provide an immediate identification of the taxon (plural taxa; group of one or more populations of related organisms) from which the sequence originated. It is now possible to identify an unknown bacterial isolate within 24 hours by this approach, and with automation, high-capacity supercomputers, and current bioinformatics tools, it can happen even sooner.

      Data, Data, Data—What To Do with All That Data?

      How does one go about storing and sorting through the massive amounts of sequencing data and information that has been generated over the years? Because of the critical need for researchers to have access to the data and be able to readily use it, a number of centralized, publicly available databases have been formed around the world. These databases, most of which are Web-based and freely available online to the public, consist of libraries of life sciences information, DNA sequencing data, protein structure data, gene expression data, and other computational or scientific data from genomics, transcriptomics, proteomics, metabolomics, and phylogenetics. Because of the need for compiling and analyzing these massive amounts of data and information from various sources, an entirely new field of bioinformatics emerged that involves design, development, management, utilization, and maintenance of these life sciences databases. Databases have become an important tool and resource for scientists studying complex biological systems. Whenever a researcher obtains or publishes a nucleotide sequence or other data in a scientific journal, the researcher is required to deposit that sequence and/or information in one of the databases, and that sequence receives an accession number, which is a tracking number that helps the databases maintain and cross-reference the information.

      The largest primary sequence databases, which form part of the International Nucleotide Sequence Database (INSD), consist of: GenBank (National Center for Biotechnology Information (NCBI)), the United States’ centralized library of various biological data, including nucleotide sequences; EMBL ENA (European Molecular Biology Laboratory European Nucleotide Archive), Europe’s library of nucleotide sequence data; DDBJ (DNA Data Bank of Japan), Japan’s nucleotide database; UniProtKB (Universal Protein Resource Knowledgebase), a database that provides protein translations of nucleotide sequences from the nucleotide sequence databases; Swiss-Prot (Swiss Institute of Bioinformatics), a protein sequence database; and RCSB PDB (Research Collaboratory for Structural Bioinformatics Protein Data Bank), a protein structure model database.

      There are public genome databases that collect libraries of genome sequences and provide annotation (assigning identification and possibly function to the genes), curation (literature citations supporting the annotation), and analysis tools to aid researchers in comparative genomics studies. For example, JGI Genomes (Department of Energy Joint Genome Institute) is a database for many eukaryotic and microbial genomes, the NMPDR (National Microbial Pathogen Data Resource) is a curated database of annotated genomic data for a number of bacterial pathogens, and the SEED is a database developed to annotate and curate 1,000 genomes using a subsystems approach based on comparative analysis of sets of genes with related functional roles. Some databases, such as KEGG Orthology (Kyoto Encyclopedia of Genes and Genomes), COGs (Clusters of Orthologous Groups), eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups), and Pfam (Protein Families and Domains), map DNA and RNA sequence reads and functional information into evolutionarily related families of similar biological activities, metabolic pathways, and/or protein structure and function. There are also databases that integrate information from multiple databases. For example, Entrez is the integrated search and retrieval system used by NCBI for assembling data from major

Скачать книгу