Principles of Microbial Diversity. James W. Brown
Чтение книги онлайн.
Читать онлайн книгу Principles of Microbial Diversity - James W. Brown страница 21
In this case, the averages are trivial; the average of 0.30 and 0.30 is, of course, 0.30, and so forth. This is not usually the case. Then the process is repeated. The closest neighbors in the reduced matrix are D with C (0.17):
Once again, the distance matrix is reduced by averaging. But be sure to average from the original distance matrix, not the previously reduced matrix:
Note that the value listed for A/B with C/D (0.265) is the average of four values in the original matrix: A to C (0.30), A to D (0.23), B to C (0.30), and B to D (0.23). If averaged AB/C (0.30) and AB/D (0.23) were averaged from the reduced matrix, the same number would be obtained in this case, but usually this is a more complex average and the numbers do not come out the same.
Once again, the smallest number in the matrix represents the nearest neighbors, in this case A/B with C/D (0.265), so these two branches are joined:
Each node on this tree has only three branches connecting to it; all of the nodes are completely resolved. This means that the structure of the tree has been determined. If there were more sequences, it would be necessary to reduce the matrix (joining A/B with C/D) and repeat the process until all of the nodes were resolved. The internal nodes have been arbitrarily labeled w, x, y, and z for reference when sorting out branch lengths (below).
Determining branch length
The next step is to determine the lengths of the branches on this tree. Basically, this is done by going through each node and finding where along the branch it is by figuring out the average difference in length along each of two branches. By choosing various sets of three sequences in a tree, the branch lengths can be sorted out just like a puzzle.
Branches w/A and w/B (w is the common node between A and B)
In our example, the distance between A and B is 0.11, and so the lengths of the two branches connecting them (A/w and w/B) must add up to 0.11. But where along this branch is the node (w)? If you look at the distance from any other sequence (C, D, or F) to A and to B, it is always the same. For example, C/A is 0.30 and so is C/B. This means that node w must be midway between A and B; each branch, then, has a length of 0.055. For example, with C used as reference:
Branches x/C and x/D
C and D are also simple neighbors, so we can easily solve these two connecting branches as well. The distance between C and D is 0.17. However, the distance to either C or D from elsewhere in the tree differs; this means that the node connecting C and D is not equidistant between them. In fact, this difference in distance between C and D varies a bit depending on which reference we use. These differences occur because we can only estimate evolutionary distance. As a result, we use the average (although most computer algorithms would use a least-squares average):
Therefore, 0.07 is the difference in branch length between A/C and A/D
This value, 0.08, is the average amount by which x/C is longer than x/D. Because the total length of C/D (think of this as C/x/D) is 0.170 and x/C is 0.08 longer than x/D, then:
x/C is the same 0.045 plus the 0.08 by which we already determined it was longer = 0.125:
Branches z/E and z/F
The distance between E and F can be calculated in the same way:
The length of E/z/F (E/F) is 0.30, and the average difference between anywhere else in the tree and E or F is −0.08. The negative sign means that the branch to F is longer than the branch to E. So z/E = (0.30 − 0.08)/2 = 0.11, and z/F − 0.11 + 0.08 = 0.1.
Internal branches w/x, x/y, and x/z
The tree so far is:
To solve the lengths of internal branches, the same process is used for collections of branches and then the average length outside the internal nodes is subtracted. To determine the lengths of w/y, x/y, and z/y, we need to determine the total lengths of w/x, x/z, and z/w and then figure out where along these to place y.
To solve for the length of w/x, the average distance between AB and CD in the reduced matrix (0.265) is used, and then the average w/A and w/B distance (AB/2 = 0.11/2 = 0.055) and the average x/C and x/D distance (CD/2 = 0.17/2 = 0.085) are subtracted to leave the w/x distance: 0.265 − 0.055 − 0.085 = 0.125.
To solve the length of x/z, the average CD/EF distance is calculated (this was not done in the reduced matrices, but it is the same process; it is 0.38) and then x/CD (CD/2 = 0.085) and x/EF (EF/2 = 0.30/2 = 0.150) are subtracted to give x/z = 0.145. Likewise, z/w = EF/AB (0.34) − w/AB (0.055) − z/EF (0.15) = 0.135.
Thus, w/x = 0.125, x/z = 0.145, and z/w = 0.135.
To determine where on any of these line segments node y is, we need to pick any of the line segments and calculate the difference to each end of this segment from the other branch. We do this just like we did for solving the lengths of the branches between A/B, C/D, or E/F. For example, for branch w/x:
Now we know all of the line segment lengths but one, y/z. We can get this by subtracting all of the known lengths of