Algorithms in Bioinformatics. Paul A. Gagniuc
Чтение книги онлайн.
Читать онлайн книгу Algorithms in Bioinformatics - Paul A. Gagniuc страница 14
1.5.5 mRNA to Proteins
In both eukaryotes and prokaryotes, mRNA molecules, which contain the information structure for protein synthesis, are stochastically encountered by two ribosomal subunits that initiate the translation step. Once bound to an mRNA transcript, the two subunits form the ribosome. The ribosome is a ribonucleoprotein (made of RNA and proteins) organelle that facilitates the formation of chemical bonds between amino acids in the order specified by the information encoded in the mRNA molecule. Life evolved a molecular scheme for translation, known as the “genetic code” [47]. In this scheme, groups of three nucleotides are associated with different amino acids used for polypeptide synthesis. Each set of consecutive and nonoverlapping nucleotide triplets on the mRNA transcript is known as a codon. Polypeptide synthesis begins from a start codon, which initiates the position of the reading frame. Usually, the start codon is represented by the “AUG” triplet (representation with the highest frequency across all life). However, other triplet combinations (non-AUG start codons) can take the role of a start codon (with a lower frequency) [48]. Post initialization, the mRNA transcript slides in between the two ribosomal subunits by one codon at a time following the reading frame set by the start codon [49, 50]. Different versions of tRNAs present in various concentrations in the cytoplasm are each linked to an amino acid. The type of amino acid connected to a tRNA is associated with an anticodon, a special nucleotide triplet region from the tRNA destined for a temporary bind to an mRNA transcript. Thus, tRNAs are the temporary links between the mRNA transcript and the nascent amino acid chain. An assembled ribosome contains three “openings” (A, P, and E sites) for tRNA–mRNA interactions (Figure 1.3.b). The smaller subunit of the ribosome allows for a complementary between three nucleotides (the codon) on the mRNA transcript and three nucleotides (anticodon) of a tRNA molecule (Figure 1.3.b). Once the mRNA–tRNA binding has been facilitated by the smaller subunit, the amino acid transfer from a tRNA to the nascent amino acid chain is facilitated by the larger subunit of the ribosome [51]. The tRNA molecules with appropriate anticodons come into contact through complementary with the mRNA transcript.
The amino acid chain is passed from the previous tRNA to the amino acid of the next incoming tRNA, increasing the growing peptide by one amino acid on each switch. Thus, the amino acid chain remains attached to the most recently bound tRNA and is not released until a termination codon appears in the mRNA transcript (UAA, UAG, UGA) [56]. Since it is an evolved/evolving scheme, small variations of the genetic code exist above different kingdoms of life, and these variations are central to the ultimate goals of bioinformatics (i.e. how life works).
1.5.6 Transfer RNA
On the other side of the translation, an ancient group of enzymes set the rules of the genetic code [57]. The aminoacyl–tRNA synthetase (tRNA-ligase) represents a group of enzymes. The function of these enzymes is to attach an appropriate amino acid to a corresponding tRNA (Figure 1.3.c). Many of these enzymes recognize their tRNA molecules using the anticodon [58]. Consequently, there is one tRNA-ligase for each tRNA–amino acid pair. For instance, in humans there are twenty different types of aminoacyl–tRNA synthetases, one for each amino acid of the genetic code [59]. Some organisms lack the genes needed for all twenty aminoacyl–tRNA synthetases. However, such organisms use all twenty amino acids for protein synthesis. In such cases, a tradeoff is made in the complexity of a tRNA-ligase, such that one enzyme associates more than one pair [60, 61]. Thus, the tRNA matching with an amino acid is based on additional properties exhibited by the tRNA, such as the geometry (shape) of the molecule, specific nucleotide positions along the tRNA chain, and so on [62].
1.5.7 Small RNA
RNAs have multiple and versatile roles across all biological systems and one of the roles is mRNA silencing and post-transcriptional regulation of gene expression. Small RNAs are short (∼18–30 nucleotides), noncoding RNA molecules that can regulate gene expression in both the cytoplasm and the nucleus. A few classes of small RNAs have been defined, such as microRNAs (miRNAs), small interfering RNAs (siRNAs), and Piwi-interacting RNAs (piRNAs) [63]. For instance, miRNAs are small noncoding RNA molecules (∼21–25 nucleotides in length) that play an important regulatory role in animals and plants by targeting specific mRNAs for degradation or translation repression [64, 65]. It appears that an imperfect complementary between miRNAs and different mRNA targets has the potential to regulate several genes simultaneously. Moreover, miRNAs cross the boundary of a single cell. To add to the complexity of these processes, some miRNAs are secreted into exosomes or microvesicles and may have the ability to move through circulation to other distant cells or tissues [66–68]. Without question, the fine-grained regulation that underlies the complexity of eukaryotes is found in these short RNA molecules.
1.5.8 The Transcriptome
The set of all RNA molecules produced by a given organism is known as the “transcriptome.” This includes, of course, the mRNA transcripts but also the RNA molecules mentioned above (i.e. mRNAs, tRNAs, rRNAs, siRNAs, miRNAs, piRNAs, and so on) as well as other uncharacterized noncoding RNA molecules. When expressed, genes produce mRNAs in different quantities, which are then detectable [69]. Currently, two main techniques are representative for capturing gene expression, namely: RNA-Seq and microarrays [70, 71]. RNA-Seq (RNA sequencing) allows for full sequencing of all RNA molecules present in a sample, whereas microarrays target known transcripts of different genes through hybridization (complementary) [72]. Thus, RNA-Seq experiments can estimate the subset of genes expressed in a cell type or in different tissues (several cell types) at any one time by an alignment of the sequenced RNAs to the reference genome (the DNA of the organism) [73]. However, the transcriptome can be seen as an ideal set, because the complete set of possible RNAs cannot be fully detected. Reasoning dictates that each state of a cell shows a specific subset of RNAs from the transcriptome. Of the total number of states that a cell can exhibit, only a few states can be induced and captured by RNA-Seq. Thus, a small subset of RNAs from the transcriptome may remain undetectable. At the tissue level, there are a number of cell types, each with a specific set of active genes. Often, the analysis of the pattern of gene expression is performed at the tissue level, i.e. on several cell types at the same time. From a global perspective, this leads to a union between the sets of genes expressed in each of the cell types that make up the tissue. Furthermore, genes that are expressed in several cell types (such as housekeeping genes) may show the highest amounts of mRNA, while genes that are only expressed in certain cell types can show lower amounts of mRNA.
1.5.9 Gene Networks and Information Processing
The mRNA and/or the protein products encoded by one gene often regulate the expression of other genes. In multicellular eukaryotes, the set of genes that are expressed in a specific cell type forms an “open” gene network. Each gene network is a self-orchestrated feedback loop constantly adapting to different inputs from the environment. The dynamics of a gene network may be deduced in practice from the gene expression levels. The RNA-Seq technique shows the set of genes, and their expression levels (amount of mRNA) at the time of cell/tissue sampling. Repeated sampling at different time intervals can complete a puzzle related to the functional relationship between the genes of the set. Direct or indirect activation of a gene promoter by the product of other genes (mRNA or proteins) is done with a relative delay and largely depends on the frequency by which the gene product is synthesized. The frequency of synthesis impacts the time of accumulation of the gene product (mRNA or proteins) in the cell as well as its stochastic diffusion toward other promoters and macromolecules with which it can interact. Note that the environment can be represented by a number of factors: the current set of molecules inside