Bioinformatics. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Bioinformatics - Группа авторов страница 57
Navigation in JBrowse is fairly straightforward, especially for those already accustomed to using the UCSC or Ensembl Genome Browsers. Tracks can be added or removed from display by using the checkboxes on the left side of the window. On the display window, click on a track name and drag it to move the track up or down. To shift the focus of the display window upstream or downstream, click on the display and drag it to the left or right. The left and right arrows at the top of the page also move the display window. JBrowse provides multiple ways to zoom in and out. One option is to use the plus and minus magnifying glasses at the top of the page. Alternatively, place the mouse in the sequence coordinates above the top track and click and drag to highlight a region and zoom in on it. Double clicking on a region also zooms in. Clicking on a track feature opens a window with additional information about that feature. For example, on the MGP Portal, clicking on a gene model in the 2.2 track opens the Gene Wiki for that model, a detailed page that includes nucleotide and protein sequences, pre-computed BLAST searches, and annotated Pfam domains. Note that although the general look and feel of JBrowse will remain similar across different genomes, individual JBrowse developers will create tracks and customizations that are specific to their genome project.
Figure 4.23 JBrowse display of a predicted Mnemiopsis gene (ML05372a) from the Mnemiopsis Genome Project Portal at the National Human Genome Research Institute. Seven tracks are shown on this display: SCF, assembled genomic regions are solid black and intermittent gaps are shaded bright pink; 2.2, consensus Mnemiopsis gene models; PFAM2.2, non-redundant Mnemiopsis protein domains derived from Pfam; CL2, RNA-seq reads derived from Mnemiopsis embryos, assembled into transcripts using Cufflinks (Trapnell et al. 2010); MASK, genomic regions that have been repeat-masked using VMatch are shaded in light blue; EST, Mnemiopsis expressed sequence tags (ESTs) from GenBank; GBNT, Mnemiopsis mRNAs and other non-EST RNAs from GenBank.
Summary
The UCSC and Ensembl Genome Browsers are sophisticated tools that provide free, web-based access to genome assemblies and annotations. This chapter has focused on examples from the human genome and a subset of the annotation tracks available for it. By adding tracks to the default view, users are able to view annotated genes, sequence variants, gene regulatory regions, gene expression data, and much more. The displays are highly customizable, and users can choose which data to view, the display style, and, in some cases, even change the colors of the annotated features. Both browsers can be accessed not only by text-based queries, such as gene symbol or chromosomal position, but also by searches with either nucleotide or protein sequences. The UCSC Genome Browser supports the BLAT search engine, while Ensembl supports both BLAT and BLAST, depending on the analysis type. Furthermore, the UCSC Table Browser and Ensembl's BioMart provide alternate entry points into the underlying data at each site, in which queries can be constructed using a web-based interface and data returned as text that can be downloaded and further manipulated. Although the examples illustrated in this chapter all derive from the GRCh38 assembly of the human genome, both UCSC and Ensembl host assemblies from many other organisms. The genomes may be assembled in shorter scaffolds, rather than chromosomes, and the variety of annotation types may be much smaller, but the basic look and feel of the genome browser will remain the same across different species.
With new developments in sequencing technology, even smaller laboratories are now able to generate whole genome sequencing data, including ChIP-Seq and RNA-seq, exome and genome sequencing, and even novel genome assemblies. Starting in 2015, genomic data sharing policies now require that all NIH-funded research that generates large-scale genomic data be submitted to a public database in a timely manner. While human data must be submitted to an NIH-designated data repository, as of this writing, non-human data may be made available through any widely used data repository. Viewing and sharing these data with the larger community of biologists may best be done with a genome browser. Both the UCSC and Ensembl Genome Browsers provide the option for users to upload their own annotations and view them in the context of the public genome data. Using Sessions or Track Hubs, users can share these data with colleagues. The Assembly Hubs feature at UCSC now allows users to share novel genomes using the Genome Browser framework. Furthermore, the source code for the UCSC Genome Browser is publicly available, so others are free to set up their own browsers to host their own annotations, or even their own genomes. Alternatively, researchers who want to host their own genome browser should consider JBrowse. This freely available software tool can be easily installed on a web server and used to host custom genomes and annotations.
The UCSC and Ensembl teams start with the same source of data, a genome assembly, often provided by the GRC. Each team then layers on its own annotations from different sources, including the location of genes, from GENCODE, RefSeq, and other gene prediction pipelines, and variants, from NCBI's dbSNP. Both browsers also include the location of experimentally determined epigenetic marks, including histone modifications, as well as DNaseI hypersensitive sites, both of which can inform predictions of gene regulatory regions. The regulatory tracks at UCSC come from the ENCODE project, while Ensembl provides a Regulatory Build, which includes data from ENCODE as well as other sources. Although individual researchers may have personal preferences about which interface is easier to use, or which site provides information that is more relevant to the biological question they are studying, most members of the bioinformatics community will undoubtedly use a genome browser at some point in their research career.
Internet Resources
UCSC Genome Browser | |
Main page | genome.ucsc.edu |
Genome Browser User's Guide | genome.ucsc.edu/goldenPath/help/hgTracksHelp.html |
Table Browser User's Guide | genome.ucsc.edu/goldenPath/help/hgTablesHelp.html |
Displaying custom annotation data | genome.ucsc.edu/goldenPath/help/customTrack.html |
Data file formats for custom annotation | genome.ucsc.edu/FAQ/FAQformat.html |
Sessions
|