Generate Genome Assemblies Using Long Sequencing Reads
This article describes the workflow of sequencing assembled genomes using long-read sequencing technologies and its application to human, microbial, animal and plant genome assembly.
<h3 id="_Overview">Overview</h3><p><em>De novo</em>&nbsp;<a href="https://www.cd-genomics.com/longseq/genome-assembly.html">genome assembly</a>&nbsp;is the process of splicing DNA fragments into contiguous segments (overlapping clusters) representing the chromosomes of an organism. Accurate, complete and contiguous genome assemblies are essential for identifying important structural and functional elements of the genome and for recognizing genetic variation. However, the short read lengths produced by conventional sequencing technologies result in highly fragmented and incomplete assemblies. Short read lengths fail to span important genomic regions, such as repetitive sequences and&nbsp;<a href="https://www.cd-genomics.com/longseq/variant-calling.html">structural variants</a>, causing them to assemble incorrectly. With the development of&nbsp;<a href="https://www.cd-genomics.com/longseq/platforms.html">long-read sequencing</a>&nbsp;technologies,&nbsp;<a href="https://www.cd-genomics.com/longseq/pacbio-smrt-sequencing-technology.html">Pacific Biosciences Single Molecule Real-Time (SMRT) Sequencing</a>&nbsp;and&nbsp;<a href="https://www.cd-genomics.com/longseq/oxford-nanopore-sequencing-technology.html">Oxford Nanopore Technologies</a>&nbsp;can provide long and ultra-<a href="https://www.cd-genomics.com/longseq/platforms.html">long sequencing read</a>s that can easily traverse the most repetitive regions of the&nbsp;<a href="https://www.cd-genomics.com/longseq/human-whole-genome-sequencing.html">human genome</a>, enabling the generation of highly contiguous genome assemblies. However, potential differences in their chemistry and sequence detection methods can affect their read lengths, base accuracy, and throughput.</p><p class="show-center"><img src="https://www.cd-genomics.com/longseq/wp-content/themes/long-read-sequencing/images/generate-genome-assemblies-using-long-sequencing-reads-1.jpg" alt="Generate Genome Assemblies Using Long Sequencing Reads" width="400" height="741" loading="lazy"></p><p class="show-center">Long-read data improves <a href="https://www.cd-genomics.com/longseq/genome-assembly.html">genome assembly</a>. (Logsdon GA&nbsp;<em>et al</em>., 2020)</p><h3 id="_Workflow_of_Long-read_Sequencing_for_Generating_Genome_Assemblies">Workflow of Long-read Sequencing for Generating Genome Assemblies</h3><p><em>(1) Sample Preparation and Library Construction</em></p><p>Starting with a pure DNA sample, the first step is to fragment the DNA to the desired size. The advantage of long-read, long-sequencing technology is that it can handle very long DNA fragments, often spanning tens to hundreds of bases. Next comes the preparation of the sequencing library, which involves attaching specific junctions to these fragments. Companies such as Oxford Nanopore offer specialized kits, such as the Ultra-Long DNA Sequencing Kit, that facilitate the sequencing of very long fragments.</p><p><em>(2) Sequencing Run</em></p><p>Once the library is ready, it is loaded onto the sequencing equipment. The sequencing process relies on detecting changes in the electrical current as the DNA strand passes through the nanopore. This real-time detection is translated into a nucleotide sequence. Notably, Oxford Nanopore's MinION and PromethION devices are capable of generating tens of kb-long reads, recording over 4 Mb.</p><p><em>(3) Data Analysis and&nbsp;<a href="https://www.cd-genomics.com/longseq/genome-assembly.html">Genome Assembly</a></em></p><p>The raw data (often referred to as "fast5" or "fastq" files) undergoes base recognition to convert electrical signals into nucleotide sequences. After this, a number of&nbsp;<a href="https://www.cd-genomics.com/longseq/long-read-sequencing-data-analysis-services.html">bioinformatics</a>&nbsp;tools facilitate quality control, read matching and&nbsp;<a href="https://www.cd-genomics.com/longseq/genome-assembly.html">genome assembly</a>. Long reads significantly reduce the complexity in short read assembly, especially when dealing with repetitive regions. Many&nbsp;softwares&nbsp;suite is specifically optimized for long read data for generating contiguous and high-quality genome assemblies.</p><p class="show-center"><img src="https://www.cd-genomics.com/longseq/wp-content/themes/long-read-sequencing/images/generate-genome-assemblies-using-long-sequencing-reads-2.jpg" alt="Generate Genome Assemblies Using Long Sequencing Reads" width="300" height="361" loading="lazy"></p><p class="show-center">The pipeline of <a href="https://www.cd-genomics.com/longseq/genome-assembly.html">genome assembly</a>&nbsp;and annotation by long reads. (Li C&nbsp;<em>et al</em>., 2017)</p><h3 id="_Applications_of_Long-read_Sequencing_for_Generating_Genome_Assemblies">Applications of Long-read Sequencing for Generating Genome Assemblies</h3><p><em>Resolving complex genomic regions</em></p><p>One of the persistent challenges in genomics is the accurate assembly of regions filled with repetitive sequences,&nbsp;<a href="https://www.cd-genomics.com/longseq/variant-calling.html">structural variants</a>, and GC-rich regions.&nbsp;<a href="https://www.cd-genomics.com/longseq/platforms.html">Long sequencing read</a>s can span these challenging regions, providing previously unattainable resolution. For example, sequencing the banana genome using Oxford nanopore technology showed fewer overlapping clusters and more complete chromosome reconstruction than short read-length methods.</p><p><em>Direct detection of modified bases</em></p><p>In addition to sequencing, the long-read&nbsp;method can detect&nbsp;<a href="https://www.cd-genomics.com/longseq/epigenetics-and-methylation-analysis.html">base modification</a>s such as methylation due to its direct sequencing approach. This provides the dual advantage of deducing nucleotide sequences and simultaneously understanding epigenetic modifications without additional experimentation.</p><p><em>Sequencing smaller microbial genomes in a single read</em></p><p>An incredible application of&nbsp;<a href="https://www.cd-genomics.com/longseq/platforms.html">long-read sequencing</a>&nbsp;is its ability to sequence smaller microbial genomes in a single read length. This completely eliminates the assembly process. For microbial researchers, this means faster insights and a deeper understanding of microbial diversity.</p><p><em>Crop improvement and breeding programs</em></p><p>In agriculture, access to high-quality&nbsp;<a href="https://www.cd-genomics.com/longseq/whole-genome-resequencing.html">reference genome</a>s can significantly accelerate breeding programs. For example, scientists at KeyGene in the Netherlands have generated the most contiguous lettuce genome assembled to date using long-read&nbsp;sequencing. Such detailed genomic information can help select for important breeding traits and thus bring improved crop varieties to market faster.</p><p><em>Exploring evolutionary and symbiotic relationships</em></p><p>The depth and breadth of&nbsp;<a href="https://www.cd-genomics.com/longseq/platforms.html">long-read sequencing</a>&nbsp;also allows researchers to study the genome evolution of unique organisms. A prime example is the sequencing of lichen fungi, which are an integral part of many terrestrial ecosystems. Through long-<a href="https://www.cd-genomics.com/longseq/platforms.html">read long sequencing</a>, a more contiguous&nbsp;<a href="https://www.cd-genomics.com/longseq/genome-assembly.html">genome assembly</a>&nbsp;is generated, leading to a better understanding of the symbiotic relationships of these fungi and their role in the environment.</p><div class="reference"><p><strong>References</strong></p><ol><li>Logsdon, Glennis A., Mitchell R. Vollger, and Evan E. Eichler. "Long-read human genome sequencing and its applications."&nbsp;<em>Nature Reviews Genetics</em>&nbsp;21.10 (2020): 597-614.</li><li>Li, Changsheng,&nbsp;<em>et al</em>. "Genome sequencing and assembly by long reads in plants."&nbsp;<em>Genes</em>&nbsp;9.1 (2017): 6.</li></ol></div>
Generate Genome Assemblies Using Long Sequencing Reads

disclaimer

Comments

https://nycityus.com/public/assets/images/user-avatar-s.jpg

0 comment

Write the first comment for this!