Molecular Biotechnology. Bernard R. Glick

Чтение книги онлайн.

Читать онлайн книгу Molecular Biotechnology - Bernard R. Glick страница 27

Molecular Biotechnology - Bernard R. Glick

Скачать книгу

(TAA, shown; also TAG or TGA) codons in mRNA. The number of nucleotides between the start and stop codons must be a multiple of three (i.e., triplet codons) and must be a reasonable size to encode a protein. In prokaryotes, a conserved ribosome-binding site (RBS) is often present 4 to 8 nucleotides upstream of the start codon (A). Prokaryotic transcription regulatory sequences such as an RNA polymerase recognition (promoter) sequence and binding sites for regulatory proteins can often be predicted based on similarity to known consensus sequences. Transcription termination sequences are not as readily identifiable but are often GC-rich regions downstream of a predicted translation stop codon. In eukaryotes, protein coding genes typically have several intron sequences in primary RNA that are delineated by GU and AG and contain a pyrimidine-rich tract. Introns are spliced from the primary transcript to produce mRNA (B). Transcription regulatory elements such as the TATA and CAAT boxes that are present in the promoters of many eukaryotic protein coding genes can sometimes be predicted. Sequences that are important for regulation of transcription are often difficult to predict in eukaryotic genome sequences; for example, enhancer elements can be thousands of nucleotides upstream and/or downstream from the coding sequence that they regulate.

      Comparison of a genome sequence to other genome sequences can reveal interesting and important sequence features. Comparisons among closely related genomes may reveal polymorphisms and mutations based on sequence differences. Association of specific polymorphisms with diseases can be used to predict, diagnose, and treat human diseases. Traditionally, cancer genetic research has investigated specific genes that were hypothesized to play a role in tumorigenesis based on their known cellular functions, for example, genes encoding transcription factors that control expression of cell division genes. Although important, this gives an incomplete view of the genetic basis for cancers. Sequencing of tumor genomes and comparing the sequences to those of normal cells have revealed point mutations, copy number mutations, and structural rearrangements associated with specific cancers. For instance, comparison of the genome sequences from acute myeloid leukemia tumor cells and normal skin cells from the same patient revealed eight previously unidentified mutations in protein coding sequences that are associated with the disease. Comparison of the genomes of bacterial pathogens with those from closely related nonpathogens has led to the identification of virulence genes. Unique sequences can be used for pathogen detection, and genes encoding proteins that are unique to a pathogen are potential targets for antimicrobial drugs and vaccine development.

      Genome comparisons among distantly related organisms enable scientists to make predictions about evolutionary relationships. For example, the Genome 10K Project aims to sequence and analyze the genomes of 10,000 vertebrate species, roughly 1 per genus. Comparison of these sequences will contribute to our understanding of the genetic changes that led to the diversity in morphology, physiology, and behavior in this group of animals.

      Another goal of genomic analysis is to understand the function of sequence features. Gene function can sometimes be inferred by the pattern of transcription. Transcriptomics is the study of gene transcription profiles either qualitatively, to determine which genes are expressed, or quantitatively, to measure changes in the levels of transcription of genes. Proteomics is the study of the entire protein populations of various cell types and tissues and the numerous interactions among proteins. Some proteins, particularly enzymes, are involved in biochemical pathways that produce metabolites for various cellular processes. Metabolomics aims to characterize metabolic pathways by studying the metabolite profiles of cells. All of these “-omic” subdisciplines of genomics use a genome-wide approach to study the function of biological molecules in cells, tissues, or organisms, at different developmental stages, or under different physiological or environmental conditions.

      Transcriptomics (gene expression profiling) aims to measure the levels of transcription of genes on a whole-genome basis under a given set of conditions. Transcription may be assessed as a function of medical conditions, as a consequence of mutations, in response to natural or toxic agents, in different cells or tissues, or at different times during biological processes such as cell division or development of an organism. Often, the goal of gene expression studies is to identify the genes that are up- or downregulated in response to a change in a particular condition. Two major experimental approaches for measuring RNA transcript levels on a whole-genome basis are DNA microarray analysis and high-throughput next-generation RNA sequencing.

      DNA Microarrays

      A DNA microarray (DNA chip or gene chip) experiment consists of hybridizing a nucleic acid sample (target) derived from the mRNAs of a cell or tissue to single-stranded DNA sequences (probes) that are arrayed on a solid platform. Depending on the purpose of the experiment, the probes on a microarray may represent an entire genome, a single chromosome, selected genomic regions, or selected coding regions from one or several different organisms. Some DNA microarrays contain sets of oligonucleotides as probes, usually representing thousands of different genes, that are synthesized directly on a solid surface. Thousands of copies of an oligonucleotide with the same specific nucleotide sequence are synthesized in a predefined position on the array surface (probe cell). The probes are typically 20 to 70 nucleotides, although longer probes can also be used, and several probes with different sequences for each gene are usually present on the microarray to minimize errors. Probes are designed to be specific for their target sequences, to avoid hybridization with nontarget sequences, and to have similar melting (annealing) temperatures so that all target sequences can bind to their complementary probe sequence under the same conditions. A complete whole-genome oligonucleotide array may contain more than 500,000 probes representing as many as 30,000 genes.

      For most gene expression profiling experiments that utilize microarrays, mRNA is extracted from cells or tissues and used as a template to synthesize cDNA using reverse transcriptase. Usually, mRNA is extracted from two or more sources for which expression profiles are compared, for example, from diseased versus normal tissue, or from cells grown under different conditions (Fig. 2.45A). The cDNA from each source is labeled with a different fluorophore by incorporating fluorescently labeled nucleotides during cDNA synthesis. For example, a green-emitting fluorescent dye (Cy3) may be used for the normal (reference) sample and a red-emitting fluorescent dye (Cy5) for the test sample. After labeling, the cDNA samples are mixed and hybridized to the same microarray (Fig. 2.45A). Replicate samples are independently prepared under the same conditions and hybridized to different microarrays. A laser scanner determines the intensities of Cy5 and Cy3 for each probe cell on a microarray. The ratio of red (Cy5) to green (Cy3) fluorescence intensity of a probe cell indicates the relative expression levels of the represented gene in the two samples (Fig. 2.45B). To avoid variation due to inherent and sequence-specific differences in labeling efficiencies between Cy3 and Cy5, reference and test samples are often reversed labeled and hybridized to another microarray. Alternatively, for some microarray platforms, the target sequences from reference and test samples are labeled with the same fluorescent dye and are hybridized to different microarrays. Methods to calibrate the data among microarrays in an experiment include using the fluorescence intensity of a gene that is not differentially expressed among different conditions as a reference point (i.e., a housekeeping gene), including spiked control sequences that are sufficiently different from the target sequences and therefore bind only to a corresponding control probe cell, and adjusting the total fluorescence intensities of all genes on each microarray to similar values under the assumption that a relatively small number of genes are expected to change among samples.

      Figure 2.45 Gene expression profiling with a DNA microarray. (A) mRNA is extracted from two samples (sample 1 and sample 2), and during reverse transcription, the first cDNA strands are labeled with the fluorescent dyes Cy3 and Cy5, respectively. The cDNA samples are mixed and hybridized to an ordered array of either gene sequences or gene-specific oligonucleotides. After the

Скачать книгу