Bioinformatics. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Bioinformatics - Группа авторов страница 50
By default, the Common SNPs (150) track is displayed in dense mode, with all variants in the region compressed onto a single line. Variants in the Common SNPs track are color coded by function. Open the Track Settings for this track in order to modify the display (Figure 4.8). Set the Display mode to pack in order to show each variant separately. At the same time, modify the Coloring Options so that SNPs in UTRs of transcripts are set to blue and SNPs in coding regions of transcripts are set to green if they are synonymous (no change to the protein sequence) or red if they are non-synonymous (altering the protein sequence), with all remaining classes of SNPs set to display in black. Note the changes in the resulting browser window, with the green synonymous and blue untranslated SNPs clearly visible (Figure 4.9).
Figure 4.7 The genomic context of the human HIF1A gene, after changing the display of the H3K4Me3 peaks from hide to full. The H3K4Me3 track is part of the ENCODE Regulation super-track. Below the graphic display window in Figure 4.5, open up the ENCODE Regulation Super-track, in the Regulation menu. Change the track display from hide to full to reproduce the page shown here. Note that the H3K4Me3 peaks, which can indicate promoter regions (Box 4.3), overlap with the transcription starts of the SNAPC1 and HIF1A genes (light blue highlight). These regions also overlap with the DNase HS track, indicating that the chromatin should be available to bind transcription factors in this region. The highlights were added within the Genome Browser using the Drag-and-select tool. This tool is accessed by clicking anywhere in the Scale track at the top of the Genome Browser display and dragging the selection window across a region of interest. The Drag-and-select tool provides options to Highlight the selected region or Zoom directly to it.
Figure 4.8 Configuring the track settings for the Common SNPs(150) track. Set the Coloring Options so that all SNPs are black, except for untranslated SNPs (blue), coding-synonymous SNPs (green), and coding-non-synonymous SNPs (red). In addition, change the Display mode of the track from dense to pack so that the individual SNPs can be seen. By default, the function of each variant is defined by its position within transcripts in the GENCODE track. However, the track used for annotation can be changed in the settings called Use Gene Tracks for Functional Annotation.
Figure 4.9 The genomic context of the human HIF1A gene, after changing the colors and display mode of the Common SNPs(150) track as shown in Figure 4.8. The SNPs in the 5′ and 3′ untranslated regions of the HIF1A GENCODE transcripts are now colored blue, while the coding-synonymous SNP is colored green.
Two types of Expression tracks display data from the NIH Genotype-Tissue Expression (GTEx) project (GTEx Consortium 2015). The GTEx Gene track displays gene expression levels in 51 tissues and two cell lines, based on RNA-seq data from 8555 samples. The GTEx Transcript track provides additional analysis of the same data and displays median transcript expression levels. By default, the GTEx Gene track is shown in pack mode, while the GTEx Transcript track is hidden. Figure 4.10 shows the Gene track in pack display mode, in the region of the phenylalanine hydroxylase (PAH) gene. The height of each bar in the bar graph represents the median expression level of the gene across all samples for a tissue, and the bar color indicates the tissue. The PAH gene is highly expressed in kidney and liver (the two brown bars). The expression is more clearly visible in the details page for the GTEx track (Figure 4.10, inset, purple box). The GTEx Transcript track is similar, but depicts expression for individual transcripts rather than an average for the gene.
An alternate entry point to the UCSC Genome Browser is via a BLAT search (see Chapter 3), where a user can input a nucleotide or protein sequence to find an aligned region in a selected genome. BLAT excels at quickly identify a matching sequence in the same or highly similar organism. We will attempt to use BLAT to find a lizard homolog of the human gene disintegrin and metalloproteinase domain-containing protein 18 (ADAM18). The ADAM18 protein sequence is copied in FASTA format from the NCBI view of accession number NP_001307242.1 and pasted into the BLAT Search box that can be accessed from the Tools pull-down menu; the method for retrieving this sequence in the correct format is described in Chapter 2. Select the lizard genome and assembly AnoCar2.0/anoCar2. BLAT will automatically determine that the query sequence is a protein and will compare it with the lizard genome translated in all six reading frames. A single result is returned (Figure 4.11a). The alignment between the ADAM18 protein sequence and lizard chromosome Un_GL343418 runs from amino acid 368 to amino acid 383, with 81.3% identity. The browser link depicts the genomic context of this 48 nt hit (Figure 4.11b). Although the ADAM18 protein sequence aligns to a region in which other human ADAM genes have also been aligned, the other human genes are represented by a thin line, indicating a gap in their alignment. The details link shown in Figure 4.11a produces the alignment between the ADAM18 protein and lizard chromosome Un_GL343418 (Figure 4.11c). The top section of the results shows the protein query sequence, with the blue letters indicating the short region of alignment with the genome. The bottom section shows the pairwise alignment between the protein and genomic sequence translated in six frames. Vertical black lines indicate identical sequences. Taken together, the BLAT results show that only 16 amino acids of the 715 amino