Cell Biology. Stephen R. Bolsover

Чтение книги онлайн.

Читать онлайн книгу Cell Biology - Stephen R. Bolsover страница 32

Cell Biology - Stephen R. Bolsover

Скачать книгу

from tissue to tissue. Drosophila genes also show alternative splicing but those of yeast, which contain few introns, do not.

      Sometimes DNA that encodes RNA is repeated as a series of copies that follow one after the other along the chromosome. Such genes are said to be tandemly repeated. The genes that code for ribosomal RNAs (about 250 copies/cell), transfer RNAs (50 copies/cell), and histone proteins (20–50 copies/cell) are tandemly repeated. The products of these genes are required in large amounts.

      This still leaves about 75% of our nuclear genome that lacks a very clearly understood function. A large proportion of this extragenic DNA is made up of repetitive DNA sequences that are repeated many times in the genome. Some sequences are repeated more than a million times and are called satellite DNA. The repeating unit is usually several hundred base pairs long, and many copies are often lined up next to each other in tandem repeats. Most of the satellite DNA is found in a region called the centromere, which plays a role in the physical movement of the chromosomes that occurs at cell division (page 235), and one theory is that it has a structural function.

      Our genome also contains minisatellite DNA where the tandem repeat is about 25 bp long. Minisatellite DNA stretches can be up to 20 000 bp in length and are often found near the ends of chromosomes, a region called the telomere. Microsatellite DNA has an even smaller repeat unit of about 4 bp or less. Again, the function of these repeated sequences is unknown but microsatellites, because their number varies between different individuals, have proved very useful in DNA testing (page 130). Other extragenic sequences, known as LINEs (long interspersed nuclear elements) and SINEs (short interspersed nuclear elements) occur in our genome. There are about 50 000 copies of LINEs in a mammalian genome and they make up about 17% of the human genome.

GENE NOMENCLATURE

      One of the great difficulties that has arisen out of genome‐sequencing projects is how to name the genes and the proteins they encode. This has not been easy and a number of committees have been set up to deal with this problem. In general, each gene is designated by an abbreviation, written in capitalized italics. For example, type 1 collagen (the commonest form in the human body) is a trimer formed of two molecules of collagen 1 α1 and one molecule of collagen 1 α2. The abbreviated names of these proteins are COL1A1 and COL1A2 respectively, using normal capitals, while the names of the genes coding for these proteins use capitalized italics: COL1A1 and COL1A2. It is mutations in COL1A1 that give rise to osteogenesis imperfecta (Medical Relevance 3.2 on page 47).

      There are many instances where for historical reasons the correlation between the protein and gene names are not so simple. For example the proteins connexin 43, 46, and 50 (page 28) are named for their relative molecular masses (43 kDa, etc.) and have the abbreviated names Cx43, Cx46, and Cx50. However, the genes that encode these proteins are called GJA1, GJA3, and GJA8 respectively, where GJ stands for gap junction.

      

      IN DEPTH 4.2 GENOME PROJECTS

      The publication in 1996 of the sequence of the genome of the single‐celled yeast S. cerevisiae was a milestone in biology. Not only did scientists have before them the complete genetic blueprint of a eukaryotic organism, but the technology for obtaining and curating huge amounts of genetic data was established. The genomes of other simple organisms such as the tiny nematode worm Caenorhabditis elegans, with just 959 body cells, and the fruit fly D. melanogaster, were published soon after, followed by more complex organisms such as the mouse and, of course, humans. Today, the sequence of the genomes of nearly 60 000 organisms, including 15 000 eukaryotic species, has been determined. Genomes from every branch of the tree of life are now available for study, including the platypus, our most distant mammalian relative, and both the nuclear and mitochondrial genomes of the Neanderthal, the hominid most closely related to present‐day humans.

      Sophisticated databases have been created to store and analyze base sequence information from the various genome projects. Computer programs analyze the data for exon sequences and compare the sequence of one genome to that of another. In this way sequences encoding related proteins (proteins that share stretches of similar amino acids) can be identified. The genome data from patients can be used to identify mutations and inform clinical decision‐making. Some important programs that can be easily accessed through the internet are BLASTN for the comparison of a nucleotide sequence to other sequences stored in a nucleotide database and BLASTP, which compares an amino acid sequence to protein sequence databases. Programs such as Clustal, MAFFT, MUSCLE, and T‐Coffee can be used to compare multiple DNA or multiple protein sequences simultaneously. 3D‐Coffee is a version of T‐Coffee that can combine data from sequence and protein structure databases in the analysis.

      The Human Genome Project, completed in 2003, was a 13‐year international effort that was described at the time as the biological equivalent of putting a man on the moon. As more and more genomes were sequenced, the technology became quicker and, more importantly, cheaper. Using Next Generation Sequencing (NGS) technologies, it is currently possible to have our genomes sequenced at a cost that ranges between a few hundred and a thousand dollars per person. As an increasing number of us have our genomes sequenced, this inexpensive but informative resource is bringing personalized medicine closer to our everyday lives. Soon, clinicians will routinely tailor treatment for a wide range of diseases to our own unique genetic makeup.

      In 2012 the 100 000 genomes project was set in motion through a collaboration between scientists and the government of the United Kingdom. Under the direction of Genomics England, the remit was to sequence 100 000 complete genomes from NHS (National Health Service) patients. The aim of this large‐scale project was to analyze DNA from patients with cancer or who had a rare disorder to try to provide an understanding of the causes of a condition and inform best treatments. In a cancer patient the genome sequence from both tumor and normal tissue was compared. For patients with a rare disease, the genomes of two relatives were also sequenced. In December 2018 the project met its target of sequencing 100 000 genomes, a remarkable achievement of progress in DNA sequence technology and analysis. To date, the project has generated over 21 petabytes of genome data and is already delivering valuable insights into how DNA sequences inform an individual's medical condition. In response to the COVID‐19 pandemic, genome sequencing of both patients and SARS‐CoV‐2 samples has provided us with information on both the way in which an individual's genome influences their susceptibility to COVID‐19 infection and on the spread of new variants through the population.

      

      Answer to thought question: Guanine cannot base pair with uracil, so there is certainly a mismatch in the DNA. However, mismatch repair cannot correct the error, because the mutation has occurred in a mature chromosome in which both DNA strands are methylated.

      When the bacterium replicates its DNA, the strands will separate and each will act as a template for the synthesis of a new strand. The unmodified strand, 5′ TGAA 3′ will have the matching strand 3′ ACTT 5′ synthesized on it, and the resulting newly synthesized strand will be unmutated, as will its own daughter strands.

      However, the modified strand 3′ AUTT 5′ will have the matching strand 5′ TAAA 3′ synthesized on it, since adenine base pairs with uracil. The daughter cell that inherits this chromosome, assuming that it is still infected with PBS2 and therefore allows the uracil to remain, will now have a chromosome with the structure

equation

      with

Скачать книгу