Principles of Microbial Diversity. James W. Brown

Чтение книги онлайн.

Читать онлайн книгу Principles of Microbial Diversity - James W. Brown страница 20

Principles of Microbial Diversity - James W. Brown

Скачать книгу

base pairs in a helix change without disrupting the structure of the RNA? Does this explain (at least in part) why base-pair changes that keep the purines and pyrimidines in the same positions (transitions) are more common than those that switch them (transversions)?

      7 7. Although RNA three-dimensional structures are scarce, there are hundreds of protein three-dimensional structures, determined by X-ray crystallography. Can you imagine a way to use these structures, analogous to the use of RNA secondary structures, to align protein sequences more meaningfully?

      4

      Constructing a Phylogenetic Tree

      In chapter 3, we covered the first three steps of a phylogenetic analysis, leaving the final step toward which the others build. The steps in a phylogenetic analysis are as follows:

      1 1. Decide which gene and species to analyze (small-subunit ribosomal RNA [SSU rRNA])

      2 2. Determine the gene sequences (polymerase chain reaction [PCR] and DNA sequencing, database “mining”)

      3 3. Identify homologous residues (sequence alignment)

      4 4. Perform the phylogenetic analysis

      The most common type of phylogenetic analysis is tree construction. A tree is nothing more than a graph representing the similarity relationships between the sequences in an alignment. This is why we’ll be going through this process in such detail, to show that tree construction is not rocket science but involves straightforward mathematical transformations of sequence data.

      There are several methods for building trees. In this chapter, we cover the neighbor-joining method in some detail as an example, because it is conceptually straightforward and commonly used. In the next chapter, we briefly cover some other approaches.

      Tree construction starts with an alignment. Neighbor joining is a distance matrix method, meaning that the alignment is first reduced to a table of evolutionary distances, a distance matrix. The distance matrix cannot be generated directly from the alignment, however, because actual evolutionary distance cannot be directly measured. Instead, the alignment is reduced to a table of observed (measurable) similarity, the similarity matrix. The distance matrix is calculated from the similarity matrix, and then the tree is generated from the distance matrix.

       Generating a similarity matrix

      The similarity matrix is just a table of fractional similarities, for example, in this alignment of six sequences with 20 positions.

images

      Just count the fraction of identical bases in every pair of sequences in the alignment.

images

      The similarity values for all pairs of sequences are calculated in the same way and assembled into a table:

images

      In this example, sequences A and B are 0.90 (90%) similar, A and C are 0.75 similar, B and C are 0.75 similar, and so forth. Note that values on the diagonal (A:A, B:B, …) do not need to be calculated; they are always 1. Likewise, there is no reason to calculate both above and below the diagonal; the value for X:Y is the same as that for Y:X, so the second calculation would be redundant.

      Next is the estimation of evolutionary distances from their sequence similarity. You might think that the distance would just be 1 − similarity (i.e., “difference”), and you would be right except that the number of differences you count between any two sequences misses some of the changes that probably have occurred between them. More than one evolutionary change at a single position (e.g., A to G to U, or A to G in one sequence and the same A to U in another) counts as only one difference between the two sequences, and in the case of reversion or convergence it counts as no change at all (e.g., A to G to A, or A to G in one organism and the same A to G in another). As a result, the observed similarity between two sequences underestimates the evolutionary distance that separates them.

      One common way to estimate evolutionary distances from similarity is the Jukes and Cantor method, which uses the following equation:

images

      As shown graphically in Fig. 4.1, similarity and distance are very closely related initially (e.g., 0.90 similarity ≈ 0.10 distance) but level off to 0.25 similarity, where evolutionary distance is infinite. This makes sense; for two sequences that are very similar, the probable frequency of more than one change at a single site is low, requiring only a small correction, whereas two sequences that have changed beyond all recognition (infinite evolutionary distance) are still approximately 25% similar just because there are only four bases and so approximately one of the four will match entirely by chance.

images

      Figure 4.1 The Jukes and Cantor equation plotted as observed sequence similarity (from the similarity matrix) versus estimated evolutionary distance. doi:10.1128/9781555818517.ch4.f4.1

images

       Generating a tree from a distance matrix

      In the neighbor-joining method, the structure of the tree is determined first and then the branch lengths are fit to this skeleton.

       Solving the tree structure

      The tree starts out with a single internal node and a branch out to each sequence: an n-pointed star, where n is the number of sequences in the alignment. The pair of sequences with the smallest evolutionary distance separating them is joined onto a single branch (i.e., the neighbors are joined, hence the name of the method), and then the process is repeated after merging these two sequences in the distance matrix by averaging their distances from every other sequence in the matrix.

      Using our distance matrix, the tree starts out like this (remember that we are sorting out the structure of the tree, not yet the branch lengths).

images

      The closest neighbors in the distance matrix are A and B (0.11 evolutionary distance), so these branches are joined:

images

Скачать книгу