Algorithms in Bioinformatics. Paul A. Gagniuc
Чтение книги онлайн.
Читать онлайн книгу Algorithms in Bioinformatics - Paul A. Gagniuc страница 24
2.5 Plasmids
Plasmids are circular or linear dsDNA molecules found in all kingdoms of life [209]. Each plasmid carries only a few genes and is capable of replicating autonomously in the host environment. Their importance is crucial, as these are among the main vectors of horizontal gene transfer [210]. Plasmids naturally exist in prokaryotes, where they were first described [211]. In eukaryotes, plasmids are most common among fungi and higher plants [212]. The average length of plasmid DNA is 0.11 Mb (±0.23; that is 110 kb) (Table 2.1). Note that bacterial plasmids contain the largest number of samples and weigh the most on the main average shown in Table 2.1. Nevertheless, plasmids vary in size and some of the largest can reach the size of a bacterial chromosome (e.g. megaplasmids). For instance, Ralstonia solanacearum is a plant pathogen that contains one of the largest megaplasmids (2.1 Mb) [213]. Another example is Streptomyces clavuligerus, a bacterium that contains a linear megaplasmid of 1.8 Mb [214]. Of course, these observations can quickly lead to hypotheses regarding speciation and the origin of chromosomes (not discussed here). Eukaryotic plasmids show an average DNA size of about 0.01 Mb (10 kb) and an average GC% close to that of organelles (37%) (Table 2.4).
Table 2.4 The average DNA length of different plasmids.
Archaea | Bacteria | Euryarchaeota | ||||
---|---|---|---|---|---|---|
Plasmids | Size (Mb) | GC% | Size (Mb) | GC% | Size (Mb) | GC% |
AV | 0.15 | 53.07 | 0.11 | 45.87 | 0.01 | 37.14 |
SD | ±0.17 | ±9.77 | ±0.23 | ±11.31 | ±0.04 | ±5.71 |
Samples | 256 | 256 | 21426 | 21426 | 118 | 118 |
Note that the unit of length for DNA is shown in mega bases (Mb). DNA fragments equal to 1 million nucleotides (1 000 000 b) are 1 mega base in length (1 Mb) or 1000 kilo bases (1000 kb) in length. For instance, 0.15 Mb is 150 kb. The last row (samples) indicates how many sequenced plasmids have been used for these computations.
It was previously mentioned that archaeal genomes showed an average size and a GC% much lower than what it was observed in bacterial genomes (Table 2.2). However, the situation seems to be reversed in the case of plasmids. The bacterial plasmids show an average size and a GC% much lower than what it was observed in archaeal plasmids (Table 2.4).
2.6 Virus Genomes
Some viruses contain a RNA-based genome and others contain a DNA-based genome. Among the DNA-based viral genomes, some species contain dsDNA and other species show a ssDNA. The same is true for RNA-based viruses; some species contain double-stranded RNA (dsRNA) and other species show a single-stranded RNA (ssRNA) [215]. Prokaryotic and eukaryotic viruses, taken together, show an average genome size of ∼ 0.04 Mb (40 kb) (Tables 2.1 and 2.5). Eukaryotes contain both the smallest and largest viral dimensions, and the smallest and largest viral genome sizes. Viruses with RNA genomes dominate the eukaryotic world [215]. RNA viruses without DNA replication intermediates are called riboviruses. Some famous riboviruses are influenza, SARS, COVID-19, hepatitis C, hepatitis E, Ebola, rabies, polio, and so on. RNA viruses that include DNA intermediates are called retroviruses. The most famous retroviruses are the human immuno-deficiency viruses (HIV-1 and HIV-2) that cause the acquired immuno-deficiency syndrome (AIDS). But what are DNA replication intermediates? Retroviruses use their own reverse transcriptase enzymes to produce a DNA copy of their RNA genome. The new DNA fragment is then incorporated into the genome of the host cell by an integrase enzyme. Post incorporation, the cell transcribes and translates its own genes and the viral genes needed to assemble new copies of the virus. It is worth mentioning here that mutation rates in RNA viruses are up to a million times higher than their hosts [216].
Table 2.5 The average genome size of different eukaryotic and prokaryotic viruses.
Viruses | Average genome size (Mb) | GC% |
---|---|---|
AV | 0.0339 | 45.3970 |
SD | ±0.0652 | ±9.2474 |
Samples | 37962 | 37962 |
Note that smaller standard deviation (SD) values indicate that more of the data are clustered about the mean while a larger SD value indicates the data are more spread out (larger variation in the data). The unit of length for DNA is shown in mega bases (Mb). DNA fragments equal to 1 million nucleotides (1 000 000 b) are 1 mega base in length (1 Mb) or 1000 kilo bases (1000 kb) in length. For instance, an average genome size of 0.0339 Mb is 33.9 kb. The last row (samples) indicates how many sequenced genomes were used for this calculation.
The physical size of organisms and the size of their genomes lack any proportionality or correlation. But the relationship between DNA quantity and physical size is partially different in the case of viruses. Interestingly, the largest viruses also contain the largest genomes and the smallest viruses contain the smallest genomes [91]. However, these extremes are occupied by virus species with a DNA-based genome. For instance, Pandoravirus salinus is among the largest virus species (1 μm long) and contains 2.5 Mb of dsDNA packed in particles of bacterium-like shapes [217].