Molecular Biotechnology. Bernard R. Glick
Чтение книги онлайн.
Читать онлайн книгу Molecular Biotechnology - Bernard R. Glick страница 24
Sequencing Using Reversible Chain Terminators
For pyrosequencing, each of the four nucleotides must be added sequentially in separate cycles. The sequence of a DNA fragment could be determined more rapidly if all the nucleotides were added together in each cycle. However, the reaction must be controlled to ensure that only a single nucleotide is incorporated during each cycle, and it must be possible to distinguish each of the four nucleotides. Synthetic nucleotides known as reversible chain terminators have been designed to meet these criteria and form the basis of some of the next-generation sequencing-by-synthesis technologies.
Reversible chain terminators are deoxynucleoside triphosphates with two important modifications: (i) a chemical blocking group is added to the 3′ carbon of the sugar moiety to prevent addition of more than one nucleotide during each round of sequencing and (ii) a different fluorescent dye is added to each of the four nucleotides to enable identification of the incorporated nucleotide (Fig. 2.37A). The fluorophore is added at a position that does not interfere with either base-pairing or phosphodiester bond formation. Similar to the case with other sequencing-by-synthesis methods, DNA polymerase is employed to catalyze the addition of the modified nucleotides to an oligonucleotide primer as specified by the DNA template sequence (Fig. 2.37B). After recording fluorescent emissions, the fluorescent dye and the 3′ blocking group are removed. The blocking group is removed in a manner that restores the 3′ hydroxyl group of the sugar to enable subsequent addition of another nucleotide in the next cycle. Cycles of nucleotide addition to the growing DNA strand by DNA polymerase, acquisition of fluorescence data, and chemical cleavage of the blocking and dye groups are repeated to generate short read lengths (i.e., 50 to 100 nucleotides per run).
Figure 2.37 Sequencing using reversible chain terminators. (A) Reversible chain terminators are modified nucleotides that have a removable blocking group on the oxygen of the 3′ position of the deoxyribose sugar to prevent addition of more than one nucleotide per sequencing cycle. To enable identification, a different fluorescent dye is attached to each of the four nucleotides via a cleavable linker. Shown is the fluorescent dye attached to adenine. (B) An adaptor sequence is added to the 3′ end of the DNA sequencing template that provides a binding site for a sequencing primer. All four modified nucleotides are added in a single cycle, and a modified DNA polymerase extends the growing DNA chain by one nucleotide per cycle. Fluorescence is detected, and then the dye and the 3′ blocking group are cleaved before the next cycle. Removal of the blocking group restores the 3′ hydroxyl group for addition of the next nucleotide.
Sequencing by Single Molecule Synthesis
To generate sufficiently high levels of a fluorescent or light signal for detection of nucleotide addition, the sequencing methods described above require large amounts of template DNA. A DNA amplification step is often required, which increases template preparation time and can introduce mutations that are interpreted as nucleotide variations. Recently, sequencing technologies have been developed to circumvent the amplification step. In one approach, a single molecule of DNA polymerase is immobilized on a solid support (on the bottom of a nanoscale well) and captures a single DNA molecule that is bound to a primer (Fig. 2.38A). During the sequence acquisition stage, DNA polymerase extends the primer in a template dependent fashion and a signal corresponding to nucleotide addition is measured in a narrow volume at the bottom of the well (Fig. 2.38B).
Figure 2.38 Real-time single-molecule sequencing. One molecule of DNA polymerase (orange shape) is attached to the bottom of a nanoscale well. A single-stranded DNA molecule (grey strand) bound to a primer (blue strand) is captured in the active site of the polymerase (A). Each of the four different nucleoside triphosphates is attached to a different fluorophore (colored stars) at the terminal phosphate, which is released during template-dependent nucleotide incorporation into the growing DNA strand. Fluorescence emission from a zeptoliter (10–21 l) volume at the bottom of the well is detected by a laser before the cleaved pyrophosphate with attached fluorophore diffuses away (B).
The nucleotide added during the extension phase is detected in real time, as it is incorporated. For real-time sequencing, the nucleotides do not carry a blocking group on the 3′ hydroxyl group and therefore DNA synthesis is continuous. A different fluorescent tag is attached to the terminal phosphate of each nucleoside triphosphate, in a manner that does not interfere with the activity of the DNA polymerase. With each nucleotide addition to the growing DNA chain, pyrophosphate is cleaved and with it the fluorescent tag. Tag cleavage therefore corresponds to nucleotide addition. The laser used to measure fluorescence is narrowly focused on the immobilized DNA polymerase and therefore records a pulse of fluorescence only in the brief time (tens of milliseconds) when the tagged nucleotide is held in the enzyme’s active site (Fig. 2.38B). Following formation of a phosphodiester bond, the fluorescent tag cleaved from the nucleotide rapidly diffuses out of the range of the detector. Translocation of the DNA template positions DNA polymerase to accept the next nucleotide into the active site. Long sequence reads (greater than 10 kbp on average) can be generated rapidly by this method; however, accuracy is generally lower than other methods due to the short time interval between nucleotide additions, dissociation of a nucleotide before a phosphodiester bond forms, and simultaneous measurement of fluorescence from more than one nucleotide.
Sequencing Whole Genomes
Just as the sequence of a gene can provide information about the function of the encoded protein, the sequence of an entire genome can contribute to our understanding of the nature of an organism. Thousands of whole genomes have now been sequenced, from organisms of all domains of life. Initially, the sequenced genomes were relatively small, limited by the early sequencing technologies. The first DNA genome to be sequenced was from the E. coli bacteriophage ΦX174 (5,375 bp) in 1977, while the first sequenced genome from a cellular organism was that of the bacterium Haemophilus influenzae (1.8 Mbp) in 1995. Within 2 years, the sequence of the larger E. coli genome (4.6 Mbp) was reported, and the sequence of the human genome (3,000 Mbp), the first vertebrate genome, was completed in 2003.
Most of these first genome sequences were generated using a shotgun cloning approach. In this strategy, a clone library of randomly generated, overlapping genomic DNA fragments is constructed in a bacterial host. The plasmids are isolated, and then the cloned inserts are sequenced using the dideoxynucleotide method. Using this approach, the first human genome was sequenced in 13 years at a cost of $2.7 billion. The aspiration to acquire genome sequences faster and at a much lower cost has driven the development of new genome sequencing strategies. Today, many large-scale sequencing projects have been completed and many more are under way, motivated by compelling biological questions. Some will contribute to our understanding of the microorganisms that cause infectious diseases and to the development of new techniques for their detection and treatment. Others are aimed at helping us to understand what it means to be human and how we evolved. Understanding the nucleotide polymorphisms among individuals with and without a specific disease will help us to determine the genetic basis of disease.
Generally, DNA sequencing projects fall into two categories: de novo genome sequencing and resequencing. Sequencing the genome of an organism that has not previously been sequenced is de novo genome sequencing, whereas resequencing involves comparing a newly determined sequence with a known reference sequence. A large-scale sequencing project typically entails (i) preparing a library of template DNA fragments, (ii) amplifying the DNA fragments which will increase