Scalable long read self-correction and ...
Document type :
Compte-rendu et recension critique d'ouvrage
Title :
Scalable long read self-correction and assembly polishing with multiple sequence alignment
Author(s) :
Morisse, Pierre [Auteur]
Scalable, Optimized and Parallel Algorithms for Genomics [GenScale]
Marchet, Camille [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Limasset, Antoine [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Lecroq, Thierry [Auteur]
Equipe Traitement de l'information en Biologie Santé [TIBS - LITIS]
Lefebvre, Arnaud [Auteur]
Equipe Traitement de l'information en Biologie Santé [TIBS - LITIS]
Scalable, Optimized and Parallel Algorithms for Genomics [GenScale]
Marchet, Camille [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Limasset, Antoine [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Lecroq, Thierry [Auteur]
Equipe Traitement de l'information en Biologie Santé [TIBS - LITIS]
Lefebvre, Arnaud [Auteur]
Equipe Traitement de l'information en Biologie Santé [TIBS - LITIS]
Journal title :
Scientific Reports
Pages :
1-13
Publisher :
Nature Publishing Group
Publication date :
2021-12
ISSN :
2045-2322
HAL domain(s) :
Informatique [cs]/Bio-informatique [q-bio.QM]
English abstract : [en]
Abstract Third-generation sequencing technologies allow to sequence long reads of tens of kbp, that are expected to solve various problems. However, they display high error rates, currently capped around 10%. Self-correction ...
Show more >Abstract Third-generation sequencing technologies allow to sequence long reads of tens of kbp, that are expected to solve various problems. However, they display high error rates, currently capped around 10%. Self-correction is thus regularly used in long reads analysis projects. We introduce CONSENT, a new self-correction method that relies both on multiple sequence alignment and local de Bruijn graphs. To ensure scalability, multiple sequence alignment computation benefits from a new and efficient segmentation strategy, allowing a massive speedup. CONSENT compares well to the state-of-the-art, and performs better on real Oxford Nanopore data. Specifically, CONSENT is the only method that efficiently scales to ultra-long reads, and allows to process a full human dataset, containing reads reaching up to 1.5 Mbp, in 10 days. Moreover, our experiments show that error correction with CONSENT improves the quality of Flye assemblies. Additionally, CONSENT implements a polishing feature, allowing to correct raw assemblies. Our experiments show that CONSENT is 2-38x times faster than other polishing tools, while providing comparable results. Furthermore, we show that, on a human dataset, assembling the raw data and polishing the assembly is less resource consuming than correcting and then assembling the reads, while providing better results. CONSENT is available at <a href="https://github.com/morispi/CONSENT">https://github.com/morispi/CONSENT</a>.Show less >
Show more >Abstract Third-generation sequencing technologies allow to sequence long reads of tens of kbp, that are expected to solve various problems. However, they display high error rates, currently capped around 10%. Self-correction is thus regularly used in long reads analysis projects. We introduce CONSENT, a new self-correction method that relies both on multiple sequence alignment and local de Bruijn graphs. To ensure scalability, multiple sequence alignment computation benefits from a new and efficient segmentation strategy, allowing a massive speedup. CONSENT compares well to the state-of-the-art, and performs better on real Oxford Nanopore data. Specifically, CONSENT is the only method that efficiently scales to ultra-long reads, and allows to process a full human dataset, containing reads reaching up to 1.5 Mbp, in 10 days. Moreover, our experiments show that error correction with CONSENT improves the quality of Flye assemblies. Additionally, CONSENT implements a polishing feature, allowing to correct raw assemblies. Our experiments show that CONSENT is 2-38x times faster than other polishing tools, while providing comparable results. Furthermore, we show that, on a human dataset, assembling the raw data and polishing the assembly is less resource consuming than correcting and then assembling the reads, while providing better results. CONSENT is available at <a href="https://github.com/morispi/CONSENT">https://github.com/morispi/CONSENT</a>.Show less >
Language :
Anglais
Popular science :
Non
Collections :
Source :
Files
- https://hal-cnrs.archives-ouvertes.fr/hal-03210290/document
- Open access
- Access the document
- https://www.nature.com/articles/s41598-020-80757-5.pdf
- Open access
- Access the document
- https://hal-cnrs.archives-ouvertes.fr/hal-03210290/document
- Open access
- Access the document
- https://hal-cnrs.archives-ouvertes.fr/hal-03210290/document
- Open access
- Access the document
- document
- Open access
- Access the document
- s41598-020-80757-5.pdf
- Open access
- Access the document
- s41598-020-80757-5.pdf
- Open access
- Access the document