Principles of Microbial Diversity. James W. Brown
Чтение книги онлайн.
Читать онлайн книгу Principles of Microbial Diversity - James W. Brown страница 19
Some of the tricks to aligning sequences by hand are the following.
Sequences are often aligned sequentially; start by aligning the two most similar sequences, then add sequences to the alignment one at a time, starting with the sequences most similar to those already aligned and finishing with the most distantly related sequences. Likewise, if you are adding a single sequence to an existing alignment, start by identifying the most similar sequence in the alignment and use that sequence as a guide.
Alternatively, you can identify conserved blocks of sequence in all of the sequences and align these. You have now broken the alignment problem into smaller, easier chunks. Add gaps as needed to align the space between prealigned chunks according to the criteria below.
Start by finding patches of very similar sequences and align these, then work out in both directions from these, adding gaps sparingly when needed. Everything after this is about rearranging (and potentially adding or removing) these gaps.
Where there are sequence differences, slide the gaps around to keep purines (G, A) aligned with purines, and pyrimidines (C, U/T) aligned with pyrimidines.
Try also to keep differences together in variable sequence positions, and align gaps together in columns wherever possible. A single gap of two positions is a lot better than two separate gaps of one position each.
Try to keep what look like conserved positions (columns) conserved, and all things being equal, put differences into positions already known to be variable.
Figure 3.7A Comparison of two RNase P RNAs with very different sequences and very similar secondary structures. RNase P RNAs are the catalytic subunits, associated with one or more accessory proteins, that remove the 5′ leaders from tRNA and other RNA precursors. (Adapted from Harris JK, Haas ES, Williams D, Frank DN, Brown JW, RNA 7:220–232, 2001, with permission.) doi:10.1128/9781555818517.ch3.f3.7A
Figure 3.7B doi:10.1128/9781555818517.ch3.f3.7B
Alignment based on conserved structure
In the case of RNAs, however, advanced alignment algorithms (e.g., infeRNAl) can use the secondary structures of the RNAs to align sequences. The ability to use well-defined secondary structures to identify homologous residues (i.e., to align sequences) is one of the key advantages of RNA over protein for phylogenetic analysis. In other words, you can use the secondary structure of the RNA to identify homologous parts of the RNA, rather than relying only on sequence similarity (Fig. 3.7).
Figure 3.8 An RNA alignment based on secondary structure. If residue n (e.g., 24, highlighted) of any sequence pairs to residue m (e.g., 29, also highlighted), then so should the corresponding homologous residues in all sequences. This is an RNA alignment based on secondary structure: stem-loop P3 of RNase P RNA. In this example, the first six rows are not sequences, they are annotations. The first three are just a reference numbering; in this case, the Methanothermobacter thermautotrophicus (Mthermo) sequence is the reference sequence. The row marked “helices” indicates the secondary structure: the 5′ strand of P3 followed by the loop and then the 3′ strand. Each base pair in this stem-loop is indicated by matching right- and left-facing parentheses in the following row and is labeled alphabetically (for human readability) in the subsequent row. doi:10.1128/9781555818517.ch3.f3.8
This works because in general it does not matter (usually) to the RNA what the bases in the helices are; what matters is that opposing bases are complementary so that they can form the helix. As a result, the secondary structure of an RNA is much more highly conserved than its sequence, because coevolution of bases that form base pairs maintains the secondary structure as the sequence changes. Variation in the length of the RNA is usually in hairpin lengthening or shortening. Therefore, it is usually possible to keep track of homologous parts of RNA structures even if the sequences are quite different.
In this type of alignment, the secondary structures of all of the RNAs are directly encoded in the alignment (Fig. 3.8). If residue n (e.g., 24 in Fig. 3.8) of any sequence pairs to residue m (e.g., 29), then so should the corresponding homologous residues in all sequences (Fig. 3.9).
Figure 3.9 RNase P RNA helix “P3” in a variety of Archaea. The base pairs corresponding to the highlighted bases in the sequence alignment in Fig. 3.8 are highlighted. P3 is present in all archaeal (and bacterial) RNase P RNAs, but both the sequence and structure of this helix are highly variable. doi:10.1128/9781555818517.ch3.f3.9
Given this type of alignment, a computer can readily compute any of the RNAs as secondary structures. Inversely, given a preexisting alignment and an RNA sequence with the same secondary structure, a computer algorithm can add this sequence correctly to the alignment. This is what infeRNAl does; it takes a sequence and tries to fold it into the correct secondary structure. If it can do so, it then threads this sequence into the alignment based on this structure.
PROBLEMS
1 1. Align the following two sequences:Now add the following sequence to this alignment:Now add the following sequence to this alignment:
2 2. Align the following sequences:
3 3. Align the following sequences:
4 4. Align the following sequences (note that these are in Fasta format, commonly used for the electronic transfer of sequence data):
5 5. Draw the secondary structures of the sequences in this alignment:
6 6. Create an alignment of the following RNA structures:
7 7. Add the following Seq V RNA structure to the preexisting alignment:
Questions for thought
1 1. What are some DNA sequences that would not be useful for phylogenetic analysis? Why?
2 2. What are some other sequences that would be useful for phylogenetic analysis, and in what situations would they be useful?
3 3. How did people get large amounts of a specific DNA for sequencing before PCR was invented?
4 4. In an episode of The X-Files (an old TV show), FBI Agent Dana Scully sequences some extraterrestrial DNA and finds “missing bands” in the sequences that she interprets to correspond to bases that are unique to aliens (not found in Earthling DNA). Why is this not technically reasonable?
5 5. Given the variation of sequences in the context of the same secondary structure, how do scientists solve these secondary structures by comparative sequence analysis?
6 6. Mutations occur one at a time.