Genotyping by Sequencing for Crop Improvement. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Genotyping by Sequencing for Crop Improvement - Группа авторов страница 27

Genotyping by Sequencing for Crop Improvement - Группа авторов

Скачать книгу

Green arrow represents plastid genome skimming responsible for plastid genome sequences recovery.

      The figure is reproduced from Loera‐Sánchez et al. (2019) which is available under a Creative Commons Attribution 4.0 (CC‐By 4.0) International License, which permits reproduction.

      The recent advancement in NGS includes next‐generation RNA sequencing (RNA‐seq) (Sharma et al. 2011; Sonah et al. 2016). This technique mainly focuses on the mRNA sequencing of only those genes that are expressed in the transcriptome (Chaudhary et al. 2019b). This technique helps in the identification of novel genes by de novo assembly without reference genome mapping. NGS also aids in the development of molecular markers. NGS has led to the exploration of thousands of markers on the entire genome resulting in an ease in the genome‐wide association studies. These markers also aid in association mapping (Sonah et al. 2015; Zargar et al. 2015). These technologies have enabled us to understand the underlying process of gene expression and the development of resources for the analysis of marker‐assisted breeding (MAS) and diversity analysis (Unamba et al. 2015).

      Whole‐genome sequencing (WGS) can be divided into two groups, which include de novo WGS and whole‐genome resequencing (WGR) (Bhat et al. 2020). WGS involves the genome sequence assembly for the first time while WGR compares genomic variability within individuals or populations (Patil et al. 2019). WGR requires the prior availability of reference genome for mapping and variant detection. Among WGS, de novo WGS involves the complete assembly of a species genome for the first time (Sevanthi et al. 2018). First, for the library preparation, high quality of genomic DNA is subjected to fragmentation followed by the addition of adaptors to the DNA fragments. For the detection of small structural variants such as INDELs or CNVs (copy number variations), short reads (350–550 bp insert size) from standard libraries are utilized while long‐read data or mate‐pair libraries with an insert size of around 2 to 20 kb will be required for the detection of large structural variants. For high‐throughput sequencing, Illumina is often used. The sequences are mapped on the genome sequence based on similarity and local contigs are developed. While assembling the sequence, repetitive regions show difficulty in alignment with the short reads. In that case, mate pair‐end sequencing reads aids in aligning large sequences which are also referred as scaffolds or supercontigs by linking and orienting contig. Unknown sequences gaps are denoted as Ns. The final result of a genome assembly corresponds to the contiguous scaffold sequences in a series separated by gaps.

      3.3.1 1K Arabidopsis Genomes Resequencing Project

      Arabidopsis thaliana belongs to the family Brassicaceae. It has 125–150 Mb diploid genome having around 30,000 protein‐coding genes distributed over five chromosomes. Weigel and Mott (2009) initiated a project to report whole‐genome sequence variation in 1001 accession. To sequence species‐wide genome of the Arabidopsis, they proposed an approach with two different aspects. In the first aspect, by using the technologies such as Roche’s 454 platform they generated a small number of sequences that approach the quality of A. thaliana’s original Col‐o (Columbia) reference. For a large number of sequence, a relatively less expensive technology for example Applied Biosystem’s SOLiD or Illumina’s Genome Analyzer was used. Local haplotype similarity was exploited by using the information from the reference genome to draw a complete genome sequence. In a second aspect of this approach, the sampling was done for ten individuals from ten populations and geographical regions all around the Eurasia and at least one accession from North Africa (10 × 10 × 10 + 1). The aim of this 1K Arabidopsis genome project was to sequence a generalized genome that can encompass every Arabidopsis accession and every genome of the A. thaliana can be completely aligned against it (Weigel and Mott 2009).

      3.3.2 3K Rice Genomes Resequencing Project

      3.3.3 Soybean Whole‐Genome Resequencing

      The WGR approach has been used in soybean to identify the Quantitative trait loci (QTL) determining colonization of arbuscular mycorrhizal fungi (AFM). The microbial community like AFM can associate with 80% of the terrestrial plants and help host plants to uptake more nutrients, provide tolerance against stresses. The colonization and extent of benefits provided by AFM depend on the host genotypes. QTL is responsible for mycorrhizal responsiveness in different plants. Pawlowski and coworkers investigated the genetic components that are involved in the AFM association. The aim of the study was the genome‐wide association analysis to identify the difference in AFM colonization in soybean genotypes and identification of genomic regions that are responsible for the colonization of AFM. They had used a genetically diverse set of 350 soybean genotypes inoculated with AFM (i.e. Rhizophagus intraradices). By using whole‐genome resequencing‐derived SNP dataset, they identified six QTL involved in the colonization of the AFM. The candidate genes identified in these QTL regions contain the homologs of the nodulin protein family and other genes responsible for symbiosis (Pawlowski et al. 2020).

      3.3.4 Chickpea

      Cicer arietinum is the best source for protein, β‐carotene, and minerals such as iron, calcium, phosphorus, manganese, and zinc. Major abiotic stresses such as heat and drought can cause up to 70% loss in yield. Varshney and his coworkers utilize NGS technology to explore the germplasm wealth present in gene banks and provide information on genetic variation, domestication, and population structure of the 429 chickpea lines. They analyzed the 2.7 Tbp (terabase pair) raw data including 28.36 billion reads with around 6 Gbp (gigabase pairs) raw data per sample. By using mapped resequencing data, they reported a map of 4.97 million SNPs, 596 100 indels, 4931 CNVs, and 60 742 PAVs (presence–absence variation) in 429 reference genotypes set. Out of 4.97, million the most of the SNPs (i.e. 85%) were found in intergenic regions and around 4% were present in coding sequences. They identified 107 375 heterozygous and 20 544

Скачать книгу