Principles of Microbial Diversity. James W. Brown

Чтение книги онлайн.

Читать онлайн книгу Principles of Microbial Diversity - James W. Brown страница 21

Principles of Microbial Diversity - James W. Brown

Скачать книгу

distances to A and B from each of the other sequences are then averaged to reduce the distance matrix:

images

      In this case, the averages are trivial; the average of 0.30 and 0.30 is, of course, 0.30, and so forth. This is not usually the case. Then the process is repeated. The closest neighbors in the reduced matrix are D with C (0.17):

images

      Once again, the distance matrix is reduced by averaging. But be sure to average from the original distance matrix, not the previously reduced matrix:

images

      Note that the value listed for A/B with C/D (0.265) is the average of four values in the original matrix: A to C (0.30), A to D (0.23), B to C (0.30), and B to D (0.23). If averaged AB/C (0.30) and AB/D (0.23) were averaged from the reduced matrix, the same number would be obtained in this case, but usually this is a more complex average and the numbers do not come out the same.

      Once again, the smallest number in the matrix represents the nearest neighbors, in this case A/B with C/D (0.265), so these two branches are joined:

images

       Determining branch length

      The next step is to determine the lengths of the branches on this tree. Basically, this is done by going through each node and finding where along the branch it is by figuring out the average difference in length along each of two branches. By choosing various sets of three sequences in a tree, the branch lengths can be sorted out just like a puzzle.

      Branches w/A and w/B (w is the common node between A and B)

      In our example, the distance between A and B is 0.11, and so the lengths of the two branches connecting them (A/w and w/B) must add up to 0.11. But where along this branch is the node (w)? If you look at the distance from any other sequence (C, D, or F) to A and to B, it is always the same. For example, C/A is 0.30 and so is C/B. This means that node w must be midway between A and B; each branch, then, has a length of 0.055. For example, with C used as reference:

images

      Branches x/C and x/D

      C and D are also simple neighbors, so we can easily solve these two connecting branches as well. The distance between C and D is 0.17. However, the distance to either C or D from elsewhere in the tree differs; this means that the node connecting C and D is not equidistant between them. In fact, this difference in distance between C and D varies a bit depending on which reference we use. These differences occur because we can only estimate evolutionary distance. As a result, we use the average (although most computer algorithms would use a least-squares average):

images images

      This value, 0.08, is the average amount by which x/C is longer than x/D. Because the total length of C/D (think of this as C/x/D) is 0.170 and x/C is 0.08 longer than x/D, then:

images

      x/C is the same 0.045 plus the 0.08 by which we already determined it was longer = 0.125:

images

      Branches z/E and z/F

      The distance between E and F can be calculated in the same way:

images

      The length of E/z/F (E/F) is 0.30, and the average difference between anywhere else in the tree and E or F is −0.08. The negative sign means that the branch to F is longer than the branch to E. So z/E = (0.30 − 0.08)/2 = 0.11, and z/F − 0.11 + 0.08 = 0.1.

images

      The tree so far is:

images

      To solve the lengths of internal branches, the same process is used for collections of branches and then the average length outside the internal nodes is subtracted. To determine the lengths of w/y, x/y, and z/y, we need to determine the total lengths of w/x, x/z, and z/w and then figure out where along these to place y.

      To solve for the length of w/x, the average distance between AB and CD in the reduced matrix (0.265) is used, and then the average w/A and w/B distance (AB/2 = 0.11/2 = 0.055) and the average x/C and x/D distance (CD/2 = 0.17/2 = 0.085) are subtracted to leave the w/x distance: 0.265 − 0.055 − 0.085 = 0.125.

images

      To solve the length of x/z, the average CD/EF distance is calculated (this was not done in the reduced matrices, but it is the same process; it is 0.38) and then x/CD (CD/2 = 0.085) and x/EF (EF/2 = 0.30/2 = 0.150) are subtracted to give x/z = 0.145. Likewise, z/w = EF/AB (0.34) − w/AB (0.055) − z/EF (0.15) = 0.135.

images

      To determine where on any of these line segments node y is, we need to pick any of the line segments and calculate the difference to each end of this segment from the other branch. We do this just like we did for solving the lengths of the branches between A/B, C/D, or E/F. For example, for branch w/x:

images images

      Now we know all of the line segment lengths but one, y/z. We can get this by subtracting all of the known lengths of

Скачать книгу