Algorithms in Bioinformatics. Paul A. Gagniuc
Чтение книги онлайн.
Читать онлайн книгу Algorithms in Bioinformatics - Paul A. Gagniuc страница 25
2.7 Viroids and Their Implications
Discussions about viruses and their simplicity or complexity form bridges that were once hard to imagine. Large viruses partially overlap with cellular mechanisms and their upper limit appears to be life. But which is the lower limit for viruses? The smallest viruses discussed here are less representative for the lower limit of infectious mechanisms. The lower limit is represented by different RNA fragments or different proteins such as prions. Prions are misfolded proteins with the ability to transmit their misfolded shape onto correctly folded proteins of the same type (please see the mad cow disease). Prion mechanisms are, perhaps, less relevant to the occurrence of life on Earth and will not be discussed here. However, the mechanisms related to self-replicating proteins represent one of the competing hypotheses for the preorigins of life. For instance, amyloid fibers arise spontaneously from amino acids under prebiotic conditions. Thus, amyloid catalysts may have played an important role in prebiotic molecular evolution [225]. In the RNA world, the current bet for the origin of life on Earth is represented by catalytic RNAs. Examples of short RNA fragments with different properties are found in many varied and distant cases throughout the scientific literature. For instance, RNA fragments of several hundred nucleotides called “viroids” are the smallest infectious pathogens [226]. Viroids were first observed in the roots of Solanum tuberosum (potatoes) by Theodor Otto Diener in 1971 [226]. The ssRNA circular structure of viroids or viroid-like satellite RNAs lacks the presence of any genes and stands somewhere in between “nothing” and RNA viruses [216]. Apparently, RNAs are the only biological macromolecules that can function both as genotype and phenotype [227]. Some viroids and viroid-like RNAs exhibit catalytic properties that allow self-cleavage and ligation [228]. This catalytic property links the opportunistic RNAs to self-splicing introns (Group I introns). Group I introns are found in protein coding genes of bacteria and their phages, nuclear ribosomal RNA (rRNA) genes, mitochondrial mRNA and rRNA genes, chloroplast transfer RNA (tRNA) genes, and so on [229–231]. In 1981, Theodor Otto Diener asked the question: Are viroids escaped introns? [232]. A small fraction of the nuclear group I introns have the potential of being mobile elements [233]. Of course, today one can ask a complementary question: Are introns some distant viroid-like RNAs introduced into the genome of different organisms through DNA intermediates? It is likely that noncoding RNAs were the indirect source for all introns [227]. These speculations place the early opportunistic catalytic RNAs at the point of origin for the eukaryotic proteome diversity. In conclusion, viroid-like molecules could have been directly implicated in the occurrence of life on Earth. It is reasonable to believe that an intersection between self-replicating proteins and catalytic RNAs has probably led to some truly rudimentary precellular forms of life. Thus, it can be speculated here that in the prebiotic period there could have been two rudimentary life forms, which gradually merged to form the Last Universal Common Ancestor (LUCA) population. Please note that “viroids” are short ssRNAs and “virions” are virus particles.
2.8 Genes vs. Proteins in the Tree of Life
Throughout different organisms, the proteome may be smaller, equal to (hardly ever), or larger than the genome. In eukaryotic species in particular, one gene may encode for more than one protein via a process known as alternative splicing. Note that RNA-splicing mechanisms are discussed in detail in Chapter 8. A comparative analysis between the average number of genes and the average number of proteins is shown in Table 2.6. Based on the values shown in this table, various rough estimates can be made on the frequency of alternative splicing in different kingdoms of life. A general equation can be formulated by assuming a “one gene–one protein” correspondence. Given that an equality between the number of proteins and the number of genes means 100%, everything that is above this threshold is a surplus that can be attributed to alternative splicing and protein splicing. Thus, the average number of genes divides the unity (a value of 1 – it can also be 100 for simplicity) and the result is multiplied by the average number of proteins. To find the average protein surplus (S), the unity is deduced from this result only if the proteome is larger than the genome, as follows:
Table 2.6 Genes vs. proteins in the tree of life.
Eukaryotes | Size (Mb) | Genes | Proteins | GC% |
---|---|---|---|---|
Animals | 1493.6 | 27 075.8 | 39 140.1 | 41.0 |
Fungi | 18.6 | 7707.5 | 6951.2 | 42.5 |
Plants | 940.8 | 39 140.1 | 45 405.0 | 38.7 |
Protists | 22.7 | 5915.1 | 5628.1 | 35.6 |
Other | 45.6 | 7546.3 | 7354.8 | 43.4 |
Prokaryotes | Size (Mb) | Genes | Proteins | GC% |
Bacteria & Archaea | 4.0 | 3829.0 | 3598.4 | 49.5 |
The table shows a comparison between the average genome size, the average GC% content, the average number of genes, and the average number of resulting proteins. Note that the unit of length for DNA is shown in mega bases (Mb). DNA fragments equal to 1 million nucleotides (1 000 000 b) are 1000 kilo bases in length (1000 kb) or 1 mega bases in length (1 Mb), or 0.001 giga bases in length (0.001 Gb). For instance, an average genome size of 1493.6 Mb is 1.4936 Gb (∼1.4 Gb).
The average animal proteome is 45% more diverse when compared to the average number of animal genes. By using the same formula from above, the average plant proteome is 16% more diverse than the average number of known genes. The fungi, protest, and prokaryote average proteome is moderately undersized. As before, the average number of genes divides unity (a value of 1) and the result is multiplied by the average number of