Bioinformatics. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Bioinformatics - Группа авторов страница 57

Bioinformatics - Группа авторов

Скачать книгу

labeled 2.2, displays the exons of the predicted gene models. The next track, called PFAM2.2, highlights Pfam domains found in the gene model. The Mnemiopsis RNA-seq reads were assembled into transcripts using the Cufflinks program (Trapnell et al. 2010), and the CL2 track shows the alignment of those transcripts to the genomic scaffold. The MASK track highlights repetitive regions. The EST and GBNT tracks show, respectively, the alignment of publicly available Mnemiopsis EST and other RNA sequences from GenBank. These two tracks are empty in this region, so the gene in the gene model track is a novel gene prediction. The overlap between the exons on the Pfam and gene model tracks shows that the predicted gene contains known protein domains. The CL2 track lends further support to the gene prediction, as the exons of the experimentally derived Mnemiopsis transcripts overlap the exons on the gene model track.

Snapshot depicts the JBrowse display of a predicted Mnemiopsis gene from the Mnemiopsis Genome Project Portal at the National Human Genome Research Institute. Seven tracks are shown on this display.

      With new developments in sequencing technology, even smaller laboratories are now able to generate whole genome sequencing data, including ChIP-Seq and RNA-seq, exome and genome sequencing, and even novel genome assemblies. Starting in 2015, genomic data sharing policies now require that all NIH-funded research that generates large-scale genomic data be submitted to a public database in a timely manner. While human data must be submitted to an NIH-designated data repository, as of this writing, non-human data may be made available through any widely used data repository. Viewing and sharing these data with the larger community of biologists may best be done with a genome browser. Both the UCSC and Ensembl Genome Browsers provide the option for users to upload their own annotations and view them in the context of the public genome data. Using Sessions or Track Hubs, users can share these data with colleagues. The Assembly Hubs feature at UCSC now allows users to share novel genomes using the Genome Browser framework. Furthermore, the source code for the UCSC Genome Browser is publicly available, so others are free to set up their own browsers to host their own annotations, or even their own genomes. Alternatively, researchers who want to host their own genome browser should consider JBrowse. This freely available software tool can be easily installed on a web server and used to host custom genomes and annotations.

      The UCSC and Ensembl teams start with the same source of data, a genome assembly, often provided by the GRC. Each team then layers on its own annotations from different sources, including the location of genes, from GENCODE, RefSeq, and other gene prediction pipelines, and variants, from NCBI's dbSNP. Both browsers also include the location of experimentally determined epigenetic marks, including histone modifications, as well as DNaseI hypersensitive sites, both of which can inform predictions of gene regulatory regions. The regulatory tracks at UCSC come from the ENCODE project, while Ensembl provides a Regulatory Build, which includes data from ENCODE as well as other sources. Although individual researchers may have personal preferences about which interface is easier to use, or which site provides information that is more relevant to the biological question they are studying, most members of the bioinformatics community will undoubtedly use a genome browser at some point in their research career.

      Internet Resources

UCSC Genome Browser
Main page genome.ucsc.edu
Genome Browser User's Guide genome.ucsc.edu/goldenPath/help/hgTracksHelp.html
Table Browser User's Guide genome.ucsc.edu/goldenPath/help/hgTablesHelp.html
Displaying custom annotation data genome.ucsc.edu/goldenPath/help/customTrack.html
Data file formats for custom annotation genome.ucsc.edu/FAQ/FAQformat.html
Sessions

Скачать книгу