Bioinformatics. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Bioinformatics - Группа авторов страница 38

Bioinformatics - Группа авторов

Скачать книгу

and gaps, the cumulative score will increase. As soon as the cumulative score breaks the score threshold S, the alignment is reported in the BLAST output. Simply clearing S does not automatically mean that the alignment is biologically significant, a very important point that will be addressed later in this discussion.

Graph depicts the BLAST search extension in which the length of extension represents the number of characters that have been aligned in a pairwise sequence comparison and the cumulative score represents the sum of the position-by-position scores, as determined by the scoring matrix used for the search and T represents the neighborhood score threshold, S is the minimum score required to return a hit in the BLAST output, and X is the significance decay.

      As the extension continues, at some point, mismatches and gaps will begin to outweigh the exact matches and conservative substitutions, accruing negative scores from the scoring matrix. As soon as the curve begins to turn downward, BLAST measures whether the drop-off exceeds a threshold called X. If the curve decays more than is allowed by the value of X, the extension is terminated and the alignment is trimmed back to the length corresponding to the preceding maximum in the curve. The resulting alignment is called a high-scoring segment pair, or HSP. Given that the BLAST algorithm systematically marches across the query sequence using all possible query words, it is possible that more than one HSP may be found for any given sequence pair.

      As one might imagine, assessing the putative biological significance of any given BLAST hit based simply on raw scores is difficult, since the scores are dependent on the composition of the query and target sequences, the length of the sequences, the scoring matrix used to compute the raw scores, and numerous other factors. In one of the most important papers on the theory of local sequence alignment statistics, Karlin and Altschul (1990) presented a formula which directly addresses this problem. The formula, which has come to be known as the Karlin–Altschul equation, uses search-specific parameters to calculate an expectation value (E). This value represents the number of HSPs that would be expected purely by chance. The equation and the parameters used to calculate E are as follows:

equation

      where k is a minor constant, m is the number of letters in the query, N is the total number of letters in the target database, λ is a constant used to normalize the raw score of the high-scoring segment pair, with the value of λ varying depending on the scoring matrix used; and S is the score of the high-scoring segment pair.

      Performing a BLAST Search

Snapshot depicts the National Center for Biotechnology Information BLAST landing page. Snapshot depicts the upper portion of the BLASTP query page in which the first section in the window is used to specify the sequence of interest, whether only a portion of that sequence should be used in performing the search, which database should be searched, and which protein-based BLAST algorithm should be used to execute the query.

Скачать книгу