Bioinformatics. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Bioinformatics - Группа авторов страница 54
Ensembl makes available many annotation tracks through the Configure this page link on the left sidebar. There are over 500 tracks available for display on GRCh38, with the majority falling in the categories of Variation, Regulation, and Comparative Genomics. The Ensembl Regulatory Build includes regions that are likely to be involved in gene regulation, including promoters, promoter flanking regions, enhancers, CCCTC-binding factor (CTCF) binding sites, transcription factor binding sites (TFBS), and open chromatin regions (Zerbino et al. 2016). A summary Regulatory Build track is turned on by default in the Location tab, and the display of individual features can be adjusted in the Configure this page menu. In the UCSC Genome Browser, the GTEx track shows that the PAH gene is highly expressed in liver and kidney (Figure 4.10); the epigenetic factors that may be controlling this activity can be viewed in Ensembl Regulatory Build. To view these factors, navigate to Regulation → Histones & polymerases on the Configure this page menu, mouse over the HepG2 human liver carcinoma line, and select All features for HepG2 (Figure 4.19a). In addition, navigate to Regulation → Open chromatin & TFBS and confirm that the DNase1 track is in its default state for HepG2; the dark blue indicates that the track is shown. Close the Configure this page menu by clicking on the check mark in the upper right corner of the pop-up window. Notice that the Regulatory Build track has now expanded to include the selected gene regulatory marks in the HepG2 cell line. Zoom in on the first exon of transcript PAH-215 to see the promoter region of this gene, being mindful of the orientation of the gene (Figure 4.19b). The solid red rectangle in the Regulatory Build track shows the location of the PAH promoter. The presence of a DNaseI hypersensitive site along with the activating histone marks of H3K27Ac, H3K4me1, H3K4me2, H3K4me3, H3K79me2, and H3K9Ac may help to explain why this gene is highly expressed in liver cells (Box 4.3). Detailed information about features in the Regulatory Build track, such as the source of the data, is available under the Regulation tab. Click on the feature and select its identifier (the letters ENSR, followed by numbers) to open this tab.
Figure 4.17 Zooming in on the bottom section of the Location tab from Figure 4.16. (a) Highlight a region of interest, the final exon of PAH transcript PAH-203, by clicking the mouse and then scrolling to the left or right. In order to highlight the region, the Drag/Select toggle in the blue bar at the top of the section must first be set to Select. (b) To zoom in to the highlighted region, select Jump to region. It may take a few iterations to create the view in this figure. At the bottom of the window is a track labeled All phenotype-associated – short variants (SNPs and indels). In this track, the SNP rs76296470 has been manually highlighted in red.
Figure 4.18 The Ensembl Variant tab. (a) To get more details about SNP rs76296470, click on the dark green SNP that is highlighted in red in the All phenotype-associated – short variants (SNPs and indels) track in Figure 4.17b. On the pop-up menu, click on more about rs76296470. The Phenotype Data section of the Variant tab is available from the link in the blue sidebar. This variant is pathogenic for phenylketonuria. (b) The Genes and regulation section of the Variant tab shows the location and function of the variant in the transcripts that overlap it. Depending on the transcript, the SNP can change a codon to a stop codon (stop gained), map downstream of a gene, or map to a non-coding transcript. The transcripts in this view represent alternatively spliced forms of the gene PAH.
Figure 4.19 The Ensembl Regulatory Build track. (a) Go to Configure this page on the left side of the Location tab and select Regulation → Histones & polymerases. Scroll to the right to find the HepG2 (human liver cancer) cell type. Mouse over the text HepG2 and turn on all features. Clicking on the box under the cell type will change the track style; leave that set to the default of Peaks. Click on the black check mark on the upper right corner of the configuration window to save the settings and exit the setup. To turn on the DNase1 (DNaseI hypersensitive sites track), select Regulation → Open chromatin & TFBS and ensure that the DNase1 box in the HepG2 column is colored dark blue so that it is in the Shown configuration. Click on the black check mark on the upper right corner of the configuration window to save the settings again. (b) Back on the Region in detail section of the Location tab, zoom in to the first exon of transcript PAH-215. Note that the first exon is on the right end of the transcript, as the gene is transcribed from right to left. The resulting display shows the details of the Regulatory Build track. The figure legend (not shown) explains that the solid red box is a promoter. The DNaseI hypersensitive site and histone marks are also shown as colored boxes.
The left sidebar of the Location tab links to a number of additional useful resources. One of those, Comparative Genomics → Synteny displays blocks of synteny between the human chromosome featured in the Location tab and chromosomes from about 30 different organisms. In these syntenic blocks, the order of genes and other sequence features is conserved across the genomes being compared. Figure 4.20a shows the synteny between human chromosome 12 and the mouse genome. A cartoon of the human chromosome 12 is shown in the center of the display as a thick white rectangle, and mouse chromosomes are drawn on the sides as thinner white rectangles. Colored rectangles indicate regions of synteny between the human and mouse. For example, the light blue region on human chromosome 12 is syntenic to the light blue region on mouse chromosome 10. The region surrounding the PAH gene is outlined in red on both human chromosome 12 and mouse chromosome 10. Below the cartoon is a list of the human genes and corresponding mouse orthologs in the region of PAH.