Bioinformatics. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Bioinformatics - Группа авторов страница 56

Bioinformatics - Группа авторов

Скачать книгу

(Figure 4.22c). After the Filters and Attributes have been set, click on the Results button in the upper left to return the BioMart output (Figure 4.22d). Data can be returned as a text file or as a formatted page in the web browser, with hyperlinks to Ensembl resources. Because of the differences in gene annotation strategies, the mapping of NCBI RefSeq accession numbers to Ensembl gene and transcript identifiers is not one to one; some RefSeq accessions map to more than one Ensembl gene and/or transcript, and some Ensembl genes map to more than one RefSeq identifier.

      Image described by caption. Image described by caption.

      Retrieving the mouse orthologs of the NCBI reference sequences must be done as a separate step, as it is not possible to return an external identifier (i.e. the starting RefSeq accession number) and an ortholog in the same BioMart query. Starting with the same Filter and human RefSeq accession numbers as before, choose the Homologues section of the Attributes and select the human Ensembl gene identifier and gene name under GeneEnsembl, as well as the mouse Ensembl gene identifier and gene name under OrthologuesMouse Orthologues. The results are shown in Figure 4.22e. Note that not all of the human gene identifiers have been mapped to a corresponding mouse ortholog. The goal of this exercise was to identify the mouse orthologs of the human RefSeq accession numbers from the GWAS Catalog. Using the human Ensembl gene identifiers as a key, the human RefSeq accession numbers can be added to the list of mouse orthologs. This can be carried out by using the VLOOKUP function in Microsoft Excel, or by writing a script in your favorite programming language, and is left as an exercise for the reader.

      While the UCSC and Ensembl Genome Browsers provide user-friendly interfaces for viewing genomic data from well-characterized organisms, there are fewer applications for displaying genome assemblies and annotations for newly sequenced organisms or non-standard assemblies. The source code and executables for the UCSC Genome Browser are freely available for academic, non-profit, and personal use, and can be set up to display custom data, not just those provided by UCSC. Thus, one option is for researchers to host their own UCSC Genome Browser and use it to share custom genomes with the bioinformatics community. An alternate method for sharing novel genome assemblies is to set up an Assembly Hub. Researchers host the specially formatted genomic sequence and data tracks on their own web site, and anyone with the URL can view the assembly though the UCSC Genome Browser.

      Another way to share novel genome assemblies is to use JBrowse (Buels et al. 2016), a web-based genome browser that is part of the Generic Model Organism Database (GMOD) project, a suite of tools for generating genomic databases. JBrowse can handle data in a variety of formats, and is relatively easy to install on a Linux- or Mac OS X-based web server (Skinner and Holmes 2010). JBrowse browsers support plant genomes (e.g. Phytozome), animal genomes (e.g. the Rat Genome Database), and disease-related databases of human data (e.g. the COSMIC Genome Browser).

      An example of using JBrowse to view a customized genome assembly and associated annotations is at the Mnemiopsis Genome Project (MGP) Portal at the National Human Genome Research Institute (NHGRI) of the US National Institutes of Health (NIH). Mnemiopsis leidyi is a type of ctenophore, or comb jelly, a phylum of gelatinous zooplankton found in all the world's seas. The members of this phylum are called comb jellies because of their highly ciliated comb rows, providing their primary means of locomotion, and these early branching metazoans have proven to be an important model organism for understanding the diversity and complexity seen in the early evolution of animals. The Mnemiopsis data featured in this portal are the first set of whole genome sequencing data on any ctenophore species to be published and made available to the scientific community (Moreland et al. 2014). The portal provides not only genomic and protein model sequence data, but also a BLAST search interface, pathway and protein domain analysis, and a customized genome browser, implemented in JBrowse, to display the annotation data.

Скачать книгу