Algorithms in Bioinformatics. Paul A. Gagniuc
Чтение книги онлайн.
Читать онлайн книгу Algorithms in Bioinformatics - Paul A. Gagniuc страница 12
Figure 1.1 The tree of life – basic diagram. The prebiotic period shown on the bottom-left represents the formation of primordial chemical molecules necessary for the ignition of life. Next, the diagram indicates the appearance of LUCA (last universal common ancestor), the first “rudimentary” form of life. The first prokaryotes appear later based on the evolution of LUCA, namely bacteria and archaea. Eukaryotes appear next in the evolutionary chain. Eukaryotes divide the tree of life into four other main subdivisions (eukaryotic kingdoms), namely: protists, fungi, animals, and plants. Note that the approximate number of known species is presented for each subdivision.
Source: Refs. [29, 74, 252, 253].
All contemporary forms of life store information in DNA molecules. DNA molecules are polymers consisting of four types of organic molecules linked together by phosphate groups, namely: adenine (A), thymine (T), cytosine (C), and guanine (G). Indirectly, all cellular processes are orchestrated by the information contained in the DNA molecule. Cellular processes store and use energy in the form of discrete packets (adenosine triphosphate molecules or ATP). In prokaryotes, DNA shows a double-stranded circular (usually) form and it is located in the internal environment of the cell (cytoplasm). Cells of eukaryotic organisms contain a double-stranded linear (usually) DNA folded inside a membrane-bound organelle, named the nucleus (from Latin – nucleus, “kernel” or “seed”; pl. nuclei). The nuclear membrane is a controlled barrier that separates the DNA molecules from the cytoplasm (Figure 1.2). Naturally, images based on electron microscopy can best show the classic structure and the inner “frozen” dynamics of eukaryotic cells (Figure 1.2). In double-stranded DNA, a cytosine molecule from one strand and a guanine molecule from the other strand, form three hydrogen bonds while adenine and thymine form two hydrogen bonds. The successive alternation of these simple hydrogen bonds along the double-stranded DNA molecule dictates the energy required to separate the two strands and establishes the local stability of the duplex. In both eukaryotes and prokaryotes, the order of the four types of nucleotides defines the information structure throughout a DNA molecule. These structures include the well-known “genes” (Greek – geneá, “generation”). Genes are regions of different lengths, found along the DNA molecule. Broadly, gene regions are in turn accompanied by regulatory structures, such as gene promoters and enhancers. Genes are involved in transcription, namely in the synthesis of RNA transcripts. Note that RNA molecules are also polymers consisting of four types of organic molecules, namely: adenine (A), uracil (U), cytosine (C), guanine (G). The RNA transcript is a single-stranded nucleotide sequence that is complementary to the DNA strand harboring the gene. In turn, the information on the RNA transcript dictates whether the transcript becomes a functional molecule within the cell or whether it becomes a template for protein synthesis.
1.4 Chromatin Structure
In multicellular organisms, every cell type usually contains the same DNA information; however, it exhibits a different phenotype. What determines this behavior ? Double-stranded DNA molecules of eukaryotic organisms are folded and distributed into chromosomes, which take the form of chromatin (DNA, histone proteins, and non-histone proteins). The basic organization of chromatin consists of filaments made of repetitive units called nucleosomes. Each nucleosome consists of eight histone proteins (i.e. type H2A, H2B, H3, and H4) that wrap ∼146 base pairs (bp) of double-stranded DNA (Figure 1.3a). Nucleosomes are connected to each other by 10–80 bp of DNA associated with linker histone H1 that wraps another 20 bp [30]. This basic form of chromatin self-assembles into higher orders of organization. Inside the cell nucleus, these higher orders of chromatin organization include the chromatin fibers, the fractal globules, and reach a final level of organization, namely the chromosomal territories [31, 32]. Chromatin is a highly dynamic three-dimensional structure, which self-arranges differently from one cell type to another, thus, establishing and maintaining the cell identity [31, 33, 34]. But what determines these chromatin self-arrangements? The predispositions for self-arrangements are determined in advance by chromatin remodelers [35]. Among the chromatin remodelers, two main families of enzymes are heavily involved in the global chromatin organization during the cell cycle, namely acetyltransferases and deacetylases. These enzymes make changes to the histone tails of the nucleosomes (histone tails are amino acid chains that extend from the nucleosomal core – Figure 1.3a). Histone tail acetylation on the nucleosomes leads to a relaxed form of chromatin, which subsequently allows transcription factors (TF) to gain access to their target genes (euchromatin). Histone deacetylation leads to a higher-order folding of nucleosomal arrays, which in turn form a dense chromatin structure that is inaccessible to the transcription machinery (heterochromatin) [36]. Thus, patterns of histone acetylation and deacetylation along the chromatin filaments dictate the initial chromatin folding and unfolding inside the cell nucleus and consequently the activity of a specific subset of genes [37]. These acetylation–deacetylation mechanisms are combined (among others) with DNA methylation mechanisms, which lead to a gradual and stable inactivation of certain genes over several cell generations (a topic discussed in other chapters). Nevertheless, the global distribution of chromatin is established immediately after the cell division and exposes a subset of genes specific to the cell type. The DNA regions that contain the genes that are part of the cell type subset are positioned more toward the center of the cell nucleus in the relaxed volume of the chromatin (called euchromatin) [31]. In contrast, genes outside the cell-type subset are distributed in the condensed volumes of chromatin (called heterochromatin) that are usually positioned toward the inner part of the nuclear membrane (Figure 1.2c–f).
Figure 1.2 Ultrastructural images of adipocyte cells from Bos taurus (Cattles). Adipocytes especially show how tolerant and adaptable cellular organelles are to various constant mechanical stresses. (a, b) Shows mitochondria in adipocytes. The right side of homogeneous light gray content represents the lipid droplet. (c) Shows a few mitochondria in proximity to the cell nucleus. (d, e) Shows the shape of the cell nucleus in different mechanical constraints induced by the size of the lipid droplets. Again, the homogeneous light gray content represents the lipid droplets from the surrounding cells. (f) Shows two adipocytes with adjacent nuclei. Within each nucleus (c–f), the genetic material can be observed in different states of activity. Inside each nucleus, the dark gray (to almost black) areas represent heterochromatin and the normal gray areas represent euchromatin. In short, euchromatin contains a specific and dynamic set of active genes that is expressed only in adipocytes, while areas of heterochromatin contain the remaining unexpressed genes. At the edge of the nuclear membrane, nuclear pores can be observed. Interruptions with a light gray hue can be seen along the perimeter of the nuclear membrane. Those are the nuclear