Molecular Biotechnology. Bernard R. Glick

Чтение книги онлайн.

Читать онлайн книгу Molecular Biotechnology - Bernard R. Glick страница 28

Molecular Biotechnology - Bernard R. Glick

Скачать книгу

reaction, each probe cell is scanned for both fluorescent dyes and the separate emissions are recorded. Probe cells that produce only a green or red emission represent genes that are transcribed only in sample 1 or 2, respectively; yellow emissions indicate genes that are active in both samples; and the absence of emissions (black) represents genes that are not transcribed in either sample. (B) Fluorescence image of a DNA microarray hybridized with Cy3- and Cy5-labeled cDNA. Reproduced with permission from http://biotech.biology.arizona.edu/Resources/DNA_analysis.html. Courtesy of N. Anderson, University of Arizona.

      Genes whose expression changes in response to a particular biological condition are identified by comparing the fluorescence intensities for each gene, averaged among replicates, under two different conditions. The raw data of the fluorescence emissions of each gene are converted to a ratio, commonly expressed as fold change. Generally, positive ratios represent greater expression of the gene in the test sample than in the reference sample. Negative values indicate a lower level of expression in the test sample relative to the reference sample. The data are often organized into clusters of genes whose expression patterns are similar under different conditions or over a period of time (Fig. 2.46). This facilitates predictions of gene products that may function together in a pathway.

      Figure 2.46 Gene expression profile of cirrhotic liver tissue. Columns 1 to 7 and 8 to 15 are expression data from liver samples from patients with ethanol- and hepatitis virus C-induced cirrhosis of the liver, respectively. Each patient’s sample was compared to normal liver tissue. A total of 2,965 genes were differentially expressed. The asterisks denote patients with severe cirrhosis of the liver. Adapted from Figure 1 in Lederer, S. L., et al., Virol J . 3:98, 2006.

      The gene expression profile in Fig. 2.46 determined by microarray analysis clearly shows that different genes are transcribed in patients with cirrhosis of the liver compared to normal individuals, and in patients with ethanol-induced cirrhosis compared to those with cirrhosis induced by the hepatitis C virus. Moreover, there is a difference between the genes that are turned on during advanced ethanol-induced liver damage compared to those with less severe ethanol-induced cirrhosis (Fig. 2.46). No such distinction is evident among individuals with different severities of virus-induced cirrhosis (Fig. 2.46). In addition, information about the transcription of genes that contribute to a particular pathway or cellular activity can be extracted from a gene expression profile. For example, genes that are transcribed during lymphocyte proliferation and activation are highly expressed in viral-induced liver cirrhosis and to a much lesser extent in ethanol-associated cirrhotic samples (Fig. 2.47).

      Figure 2.47 Gene expression profile of lymphocyte-specific genes from cirrhotic liver tissue. Columns 1 to 7 and 8 to 15 are expression data from liver samples from the patients described in Fig. 2.46 with ethanol- and hepatitis virus C-induced cirrhosis of the liver, respectively. Each patient’s sample was compared to normal liver tissue. The cluster consists of about 70 genes. The asterisks denote patients with severe cirrhosis of the liver. Adapted from Figure 2B in Lederer, S. L., et al., Virol J. 3:98, 2006.

      RNA Sequencing

      Similar to microarrays, RNA sequencing is used to detect and quantify the complete set of gene transcripts produced by cells under a given set of conditions. In addition, RNA sequencing can delineate the beginning and end of genes, reveal posttranscriptional modifications such as variations in intron splicing that lead to variant proteins, and identify differences in the nucleotide sequence of a gene among samples. In contrast to microarray analysis, this approach does not require prior knowledge of the genome sequence, avoids high background due to nonspecific hybridization, and can accurately quantify highly expressed genes (i.e., probe saturation is not a concern as it is for DNA microarrays). Traditionally, RNA sequencing approaches required generating cDNA libraries from isolated RNA and sequencing the cloned inserts, or the end(s) of the cloned inserts (expressed sequence tags), using the dideoxynucleotide method. New developments in sequencing technologies circumvent the requirement for preparation of a clone library and enable high-throughput sequencing of cDNA.

      For high-throughput RNA sequencing, total RNA is isolated and converted to cDNA using reverse transcriptase and a mixture of oligonucleotide primers composed of six random bases (random hexamers) that bind to multiple sites on all of the template RNA molecules (Fig. 2.48A). Because rRNA makes up a large fraction (>80%) of the total cellular RNA and levels are not expected to change significantly under different conditions, these molecules are often removed prior to cDNA synthesis by hybridization to complementary oligonucleotides that are covalently linked to magnetic beads for removal. Long RNA molecules are fragmented to pieces of about 200 bp by physical (e.g., nebulization), chemical (e.g., metal ion hydrolysis), or enzymatic (e.g., controlled RNase digestion) methods either before cDNA synthesis (RNA fragmentation) or after cDNA synthesis (cDNA fragmentation).

      Figure 2.48 High-throughput RNA sequencing. (A) Total RNA is extracted from a sample and rRNA may be removed. The RNA is fragmented and then converted to cDNA using reverse transcriptase. Adaptors are added to the ends of the cDNA to provide binding sites for sequencing primers. High-throughput next-generation sequencing technologies are used to determine the sequences at the ends of the cDNA molecules (paired end reads). The sequence reads are aligned to a reference genome or assembled into contigs using the overlapping sequences. Shown is the alignment of paired end reads to a gene containing one intron. (B) RNA expression levels are determined by counting the reads that correspond to a gene. Adapted with permission from Wang et al., Nat Rev Genet. 10:57–63, 2009.

      The cDNA fragments are ligated at one or both ends to an adaptor that serves as a binding site for a sequencing primer (Fig. 2.48A). High-throughput next-generation sequencing technologies are employed to sequence the cDNA fragments. The sequence reads are assembled in a manner similar to that for genomic DNA, which is by aligning the reads to a reference genome or by aligning overlapping sequences to generate contigs for de novo assembly when a reference genome is not available. The reads are expected to align uniformly across the transcript (Fig. 2.48A). Gene expression levels are determined by counting the reads that correspond to each nucleotide position in a gene and averaging these across the length of the transcript (Fig. 2.48B). Expression levels are typically normalized between samples by scaling to the total number of reads per sample (e.g., reads/kilobase pair/million reads). Appropriate coverage (i.e., the number of cDNA fragments sequenced) is more difficult to determine for RNA sequencing than for genome sequencing because the total complexity of the transcriptome is not known before the experiment. In general, larger genomes and genomes that have more RNA splicing variants have greater transcriptome complexity and therefore require greater coverage. Also, accurate measurement of transcripts from genes with low expression levels requires sequencing of a greater number of transcripts. Quantification may be confounded by the high GC content of some cDNA fragments which have a higher melting temperature and therefore are inefficiently sequenced, by overrepresentation

Скачать книгу