Algorithms in Bioinformatics. Paul A. Gagniuc

Чтение книги онлайн.

Читать онлайн книгу Algorithms in Bioinformatics - Paul A. Gagniuc страница 20

Algorithms in Bioinformatics - Paul A. Gagniuc

Скачать книгу

      Biological literature is probably the most sophisticated among all sciences and can be particularly overwhelming. An introduction was made to some important concepts that can provide an overview on living organisms, such as the emergence of life, classification, number of species, the origins of eukaryotic cells, the endosymbiosis theory, organelles, reductive evolution, the importance of HGT, and the main hypotheses regarding the origin of eukaryotic multicellularity. Among the biological concepts described here, some have wider implications. Examples of genome-less organelles, such as hydrogenosomes, or processes such as the HGT, question life as we understand it. Endosymbionts best explain the significance of the environment and also explain the distribution of life in a blurry, nonunitary context. In other words, endosymbiosis widens the threshold of life and shows how difficult it is to place a border between how much life resides inside or outside the cell. Moreover, the HGT appears to connect all the species on earth to a greater or lesser extent. Much evidence shows that some of these ancient processes (e.g. catalytic RNAs) are likely adding or subtracting innovative mechanisms for continuous adaptations among different species (if not all).

      2.1 Introduction

      An insight into the context of biological information is of utmost importance for different approaches in bioinformatics. The first part of the chapter discusses the units of measurement and explains the meaning of some notations used here. A few interesting unit conversions, with accompanying algorithms, are shown in addition to the subject. Next, eukaryotic and prokaryotic organisms with the largest/smallest genomes are presented in detail. Moreover, different computations performed for this chapter show the average genome size above the major kingdoms of life, including the average genome size of different organelles, plasmids, and viruses. Toward the end of the chapter, a comparative analysis is made between the average number of genes and the average number of proteins above the main kingdoms of life. This informative analysis highlights the frequency of a process called alternative splicing, which allows certain eukaryotic genes to encode for several types of proteins.

      There is no direct correlation between the genome size of a species and the complexity of its phenotype. In any case, the intellectual curiosity regarding the size of genomes still remains. Determination of genome size based on DNA sequencing data is one of the most accurate methods to date. To observe the lack of correlation between genome size and phenotype, upper-bound extremes can be considered here. As expected in an intuitive manner, eukaryotes show the largest genomes. In animals, the amphibian Ambystoma mexicanum (the Mexican Axolotl) shows the largest (sequenced) genome observed in nature to date. A. mexicanum shows a genome size of 32 396 Mbp (32 Gb) and a physical length that can reach up to 30 cm [166]. In plants, the record is held by Pinus lambertiana (27 603 Mbp) and Sequoia sempervirens (26 537 Mbp). P. lambertiana is the tallest and most massive pine tree [167, 168]. S. sempervirens species includes the tallest living trees on Earth (115.5 m in height or 379 ft) [169]. Among the prokaryotes, Minicystis rosea and Sorangium cellulosum So0157-2 show the largest genomes. The bacterial genome of M. rosea contains 16 Mbp of DNA (GC%: 69.1) and shows the maximum genome size found in prokaryotes [170]. Secondary to this species is the bacterial genome of S. cellulosum So0157-2, with 14.78 Mbp of DNA (GC%: 72.1) [171]. As discussed in the previous chapter, endosymbiosis challenges the notion of the smallest genome necessary for life. The smallest prokaryotic genomes were found in different obligate symbionts. One such case is Nasuia deltocephalinicola with a genome of 112 kbp (0.11 Mbp) [172, 173]. The eukaryotes with the smallest nuclear genome necessary for life are found in the kingdom of fungi. The spore-forming unicellular parasite Encephalitozoon intestinalis shows a genome size of ∼2.3 Mbp and a total of 1.8k protein-coding genes [174]. Nonetheless, the smallest free-living eukaryote is Ostreococcus tauri, a marine green alga with a diameter of about 0.8 μm and a genome size of 12.6 Mbp (8.2k protein-coding genes) [175].

      2.3.1 Alternative Methods

      2.3.2 The Weaving of Scales

      To get a sense of genome size closer to our reference system, some transformations can express the mega base pairs as physical lengths. The linear length of a double-stranded DNA (dsDNA) molecule can be calculated by multiplying the average distance between bases (∼3.4 angstrom = 0.34 nm [179, 180]; 1 angstrom = 0.1 nm) by the total number of base pairs in a genome. Here, genomes are expressed in mega base pairs. Since 1Mbp is equal to one million base pairs, the size of a genome can be multiplied by one million and then multiplied further by the average distance between bases (0.34 nm). One meter is equal to 1 000 000 000 nanometers (1 × 109). Thus, the result expressed in nanometers is divided by 1 × 109 for conversion to meters.

equation

      Depending on the organism, cells of different tissues can be characterized based on the number of sets of chromosomes present: monoploid (one set of chromosomes), diploid (two sets), triploid (three sets), tetraploid (four sets), pentaploid (five sets), and so on. For instance, the human genome contains 3.1 Gbp (3100 Mbp). Thus, in a human haploid (or monoploid) cell (e.g. a single set of chromosomes found in a gamete), the unfolded length of a single set of chromosomes, arranged linearly one after the other, would show an approximate length of:

equation

Скачать книгу