Bacterial Pathogenesis. Brenda A. Wilson

Чтение книги онлайн.

Читать онлайн книгу Bacterial Pathogenesis - Brenda A. Wilson страница 57

Bacterial Pathogenesis - Brenda A.  Wilson

Скачать книгу

databases, including literature sources (such as PubMed), nucleotide and protein sequences, protein structure, taxonomy, genome, expression, chemical, and other databases, and it makes the resulting combined information available to the public through a single platform (URL: http://www.ncbi.nlm.nih.gov). Similarly, UniProt is a European database that is comprised of four core databases for protein sequence and function.

      One of the greatest challenges with having so much data and information available is that it is difficult for the databases to verify the input data. While some database resources try to maintain oversight, it is often left to the researchers who deposit the data to annotate and curate their data. This is not always a reliable way to ensure that the data are correct, so the end user must also be wary and take care not to use false data. Often, what happens is the researchers deposit large quantities of sequencing data, for which no annotation or curation has occurred. This is becoming ever more prevalent with the massive metagenomic sequencing efforts that are currently underway, such that the number of 16S rRNA gene sequences of uncultured microorganisms has already far surpassed the number of cultured microorganisms. Defining taxonomic thresholds for separating cultured bacteria and archaea based on 16S rRNA gene sequence comparison is already challenging in terms of the diversity observed, even when nearly the entire gene is used. Taxonomic classification of uncultured bacteria and archaea, particularly using information obtained from only partial gene sequences (short reads) typically obtained through large-scale metasequencing platforms, is currently one of the most fundamental problems in microbiology.

      To deal with this issue of sorting microorganisms into defined taxa at the species, genus, or higher taxonomic ranks using strictly molecular genetic criteria, some databases have the capability of matching sequences from a sequence library obtained from known, well-characterized bacteria or archaea (so-called type strains), which allows one to link taxonomy with phylogeny. One such database is the Ribosomal Database (RDB, URL: http://rdp.cme.msu.edu), which provides online data analysis, alignment, and annotation of bacterial and archaeal small-subunit 16S rRNA gene sequences and fungal 18S rRNA gene sequences from a sequence library of only type strains or from the entire collection of sequences regardless of annotation (type and non-type strains). In addition, the RDB provides alignments for sequence comparisons and phylogenetic analysis that incorporates information from the conserved secondary structure of 16S rRNAs, which enables improved comparisons of short partial sequences and handles some artifacts that might arise from large-scale sequencing.

      But the best way to experience the amazing power of bioinformatics is to try it for yourself. Go to the Entrez site or another database and type the name of your favorite protein. You will be amazed at the depth of information that is available about this protein: which species produce it; phylogenic relationships of the protein in different species; its structure (often done by different methods and perhaps even bound to ligands); the possible functions of its domains and how these domains relate to related domains in other proteins; how its expression is regulated at the gene and activity levels; where it fits in metabolism or cellular processes; signal transduction pathways that impinge on it; and on and on. The total amount of new biological information may seem daunting, but you can best appreciate the new depth of the current biological revolution by plunging in and looking for yourself. Besides, the structures and relationships are truly beautiful—and it is all free!

      The realization that sequencing the 16S rRNA genes can be used to rapidly and accurately identify bacterial strains introduced a new era in bacterial detection and identification. Because the primers recognize conserved regions of the 16S rRNA gene, which are universal in bacteria, this approach can be used to rapidly identify bacteria that are not amenable to cultivation. The revolution first hit in environmental microbiology, for which nothing equivalent to the detailed identification protocols of clinical microbiology existed and the vast majority of the bacteria could not be cultured using conventional media. One of the first successes of this approach in clinical microbiology was the identification of the bacterium that causes a rare form of intestinal disease called Whipple’s disease. A bacterium-like form could be seen in tissues of infected people but attempts at cultivation had been unsuccessful. Finally, using this technique, the Gram-positive bacterium associated with Whipple’s disease was identified as Tropheryma whipplei.

      The 16S rRNA gene sequence profile of the microbial community present in a sample can be represented as a phylogenetic tree, such as that illustrated in Figure 5-2, which shows the evolutionary relationship of the bacteria to each other based on sequence similarity of the reads. Each branch point (or node) is a taxonomic unit that represents the most recent common ancestor of the descendants. The lengths of the branches represent estimated sequence similarities or relationship distances from each other, from which estimates of evolutionary time can be inferred. An operational taxonomic unit (OTU) is a term used to group closely related organisms based on their sequence similarity of a specific taxonomic gene, such as the 16S rRNA gene. Usually when analyzing microbiotas, the researcher sets a threshold sequence similarity of 97%, 98%, or 99% to define a cluster of OTUs, but this can vary and may be influenced by errors in DNA sequencing.

      Figure 5-2. Phylogenetic trees to show relationships among microbial communities. Phylogenetic trees are used to illustrate ways rRNA gene sequence data can be displayed to show inferred evolutionary relationships among microbial communities based upon sequence similarities of the microbes with each other. (A) Shown are the steps used to generate a phylogenetic tree based on PCR amplification and sequencing of the nearly full-length 16S rRNA gene sequences (8–1,492) using the commonly used bacterial primer pair (27F + 1492R). (B) The phylogenetic tree shown here displays the phylogenetic relationships among the bacteria found in the vaginas of healthy women. The scale bar represents 0.02 nucleotide substitutions per site in the 16S rRNA gene sequences. (C) Shown is an example of a dendrogram (tree) of the phylogenetic relationships of the bacterial communities from seven different human vaginal microbiota samples, based on their 16S rRNA gene sequence profiles (similar to the one shown in panel B). The lines denote the phylogenetic distance between each of the samples, which is a measure of the relationship or similarity of one sample to the other. For example, samples #4 and #5 are about 10% different from each other (the lines, or branches of the tree, converge at around 5% on the bar index at the bottom), whereas samples #6 and #7 are about 40% different from all of the other samples (#1–#5).

      Another common method for depicting phylogenetic relationships or similarity (in terms of distance from each other) among microbial profiles is to use mathematical ordination methods, which are multivariate algorithms in which species units (taxa) or distance profiles of communities are clustered (or ordered) along gradients. One such method, called principal component analysis (PCA), generates a PCA plot from a microbial community data matrix, in which rows are taxa (or relative abundance or similarity distance) and columns are samples (Figure 5-3A). In a PCA plot, the observed data are mathematically transformed into a coordinate system (x,y-plot or x,y,z-plot), such that the data are projected to fit within the plot, where each axis (coordinate) represents a variable called a principal component (PC). The x axis (PC1) represents the component with the greatest variance (i.e., has the factor 1 that accounts for most of the variability in the data) and the orthogonal (perpendicular) y axis (PC2) represents the component with the second greatest variance (i.e., has another factor 2 that accounts for the next highest variability in the data). As for a phylogenetic tree, the closer the points are clustered together, the more similar the sample profiles are to each other (Figure 5-3B).

      Figure 5-3. Principal component analysis to show relationships among microbial communities.

Скачать книгу