Читать онлайн книгу - Algorithms in Bioinformatics. Paul A. Gagniuc. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Algorithms in Bioinformatics - Paul A. Gagniuc

Скачать книгу

of chromosomes/cell). For a diploid cell (2n = 46 Chr), the linear length of all 46 dsDNA molecules is calculated as above and the result in multiplied by two:

Therefore, the two sets (2n = 46 Chr) of human chromosomes found inside a somatic cell can theoretically unfold up to 2.1 m. The linear length of dsDNA molecules from all chromosomes of a somatic cell and the estimated average number of somatic cells in the human body, can be used for various mental experiments (e.g. comparisons between DNA lengths and cosmic distances). These calculations can be empirically extended for ssDNA molecules placed linearly one after the other. For instance, the 2.1 m of dsDNA from a somatic cell, of course, doubles if the ssDNA approach is considered (2.1 m × 2 DNA strands = 4.2 m of ssDNA). The implementation found in Additional algorithm 2.1 uses the above formula to convert the number of bases of a genome to physical length expressed in meters. Important: For convenience, from this point on all notations “b”, “kb”, “Mb”, “Gb” will refer to dsDNA (double stranded DNA).

Additional algorithm 2.1 Note that the source code is in context and works with copy/paste.

Above, the example is given on Homo sapiens and the result shows the calculated total length of unfolded chromosomes for both haploid cells and diploid (somatic) cells. This computation can be applied to all genomes mentioned so far by calling function f repeatedly. Thus, Additional algorithm 2.1 is extended to perform this calculation for an arbitrary number of species (Additional algorithm 2.2).

Additional algorithm 2.2 Note that the source code is in context and works with copy/paste.

To call function f repeatedly, a parsing-based method is used. Above, variable a contains a series of records. The structure of these records is based on two delimiters, namely: “|” and “Mb.” Delimiter “|” separates the species name ( r[0] ) from the size of the genome ( r[1] ), while the “Mb” delimiter separates the records from each other ( t[u] ). Please note that 0.001 m equals 1 mm. For instance, the output of Additional algorithm 2.2 shows that Escherichia coli contains a genome of ∼1.6 mm in length (0.0016 m), or that E. intestinalis contains a genome of 0.78 mm in length (0.00078 m).

2.3.3 Computations on the Average Genome Size

A series of computations show the average genome size observed for each division in the tree of life, as well as the average size of viral genomes and the average DNA length of plasmids (Figure 2.1 and Table 2.1). These values were calculated from the raw data extracted from the file transfer protocol (FTP) of the National Center for Biotechnology Information (NCBI). The NCBI section for Genome Information by Organism contains general data in relation to each branch from the tree of life: eukaryotes (13k); prokaryotes (265k); viruses (41k); plasmids (23k); organelles (17k). These categories amount to ∼359k DNA/RNA sequences of different assembly levels of readiness, of which 341k sequence samples of assembly level “complete” were used to calculate the averages presented here. Thus, filters were used to obtain a clean data set. For instance, only levels for “complete chromosomes” or “complete genomes” were considered for these calculations.

Moreover, the maximum values presented in the main text were extracted from these data and checked against the literature. The files containing the raw data can be found in the additional materials online. Important note: The number of samples shown on the last row of Table 1.4 can be misleading. Table 1.4 shows 252k prokaryote samples, whereas the cataloged prokaryotes in Table 1.1 show a total of 12k species. In the NCBI database, prokaryotes have more than one reference or representative genome per species. According to NCBI filters, around 3.2k of the prokaryote genomes are representative.

Schematic illustration of the average genome size. (a) Shows the proportion of known species in each kingdom of life. (b) It shows the tree of life with data on the main kingdoms of life. Each kingdom is labeled with the average genome size and the average GC% content. (c) Shows the average organellar genome for a number of organelles investigated to date. Here, the organelles are sorted by GC%. (d) It shows a comparison between mitochondria and chloroplasts. (e) Shows a comparison between plasmids from bacteria, archaea, and eukaryotes. For each chart (c–e), the left axis indicates the GC% percentage and the right axis indicates the average size of the genome expressed in mega base pairs.

Figure 2.1 The average genome size. (a) Shows the proportion of known species in each kingdom of life. (b) It shows the tree of life with data on the main kingdoms of life. Each kingdom is labeled with the average genome size and the average GC% content. (c) Shows the average organellar genome for a number of organelles investigated to date. Here, the organelles are sorted by GC%. (d) It shows a comparison between mitochondria and chloroplasts. (e) Shows a comparison between plasmids from bacteria, archaea, and eukaryotes. For each chart (c–e), the left axis indicates the GC% percentage and the right axis indicates the average size of the genome expressed in mega base pairs (written here as Mb instead of Mbp, for ease).

Table 2.1 The average genome size in the tree of life.

Genome size average (Mb)
	Eukaryotes (Mb) Скачать книгу В начало < 16 17 18 19 20 21 22 23 24 25 > В конец e-mail: [email protected]

Algorithms in Bioinformatics. Paul A. Gagniuc

Чтение книги онлайн.

Читать онлайн книгу Algorithms in Bioinformatics - Paul A. Gagniuc страница 21

Информация о книге:

Additional algorithm 2.1 Note that the source code is in context and works with copy/paste.

Additional algorithm 2.2 Note that the source code is in context and works with copy/paste.

2.3.3 Computations on the Average Genome Size