Comparative assessment of long-read error ...
Type de document :
Compte-rendu et recension critique d'ouvrage
DOI :
PMID :
Titre :
Comparative assessment of long-read error correction software applied to Nanopore RNA-sequencing data
Auteur(s) :
Lima, Leandro [Auteur correspondant]
Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale [ERABLE]
Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 [LBBE]
Università degli Studi di Roma Tor Vergata [Roma, Italia] = University of Rome Tor Vergata [Rome, Italy] = Université de Rome Tor Vergata [Rome, Italie]
Marchet, Camille [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Caboche, Ségolène [Auteur]
Centre d’Infection et d’Immunité de Lille - INSERM U 1019 - UMR 9017 - UMR 8204 [CIIL]
da Silva, Corinne [Auteur]
Laboratoire de Bioinformatique pour la Génomique et la Biodiversité [LBGB]
Istace, Benjamin [Auteur]
Laboratoire de Bioinformatique pour la Génomique et la Biodiversité [LBGB]
Aury, Jean-Marc [Auteur]
Laboratoire de Bioinformatique pour la Génomique et la Biodiversité [LBGB]
Touzet, Helene [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Chikhi, Rayan [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Bioinformatics and Sequence Analysis [BONSAI]
Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale [ERABLE]
Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 [LBBE]
Università degli Studi di Roma Tor Vergata [Roma, Italia] = University of Rome Tor Vergata [Rome, Italy] = Université de Rome Tor Vergata [Rome, Italie]
Marchet, Camille [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Caboche, Ségolène [Auteur]
Centre d’Infection et d’Immunité de Lille - INSERM U 1019 - UMR 9017 - UMR 8204 [CIIL]
da Silva, Corinne [Auteur]
Laboratoire de Bioinformatique pour la Génomique et la Biodiversité [LBGB]
Istace, Benjamin [Auteur]
Laboratoire de Bioinformatique pour la Génomique et la Biodiversité [LBGB]
Aury, Jean-Marc [Auteur]
Laboratoire de Bioinformatique pour la Génomique et la Biodiversité [LBGB]
Touzet, Helene [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Chikhi, Rayan [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Bioinformatics and Sequence Analysis [BONSAI]
Titre de la revue :
Briefings in Bioinformatics
Pagination :
1-18
Éditeur :
Oxford University Press (OUP)
Date de publication :
2019-06-24
ISSN :
1467-5463
Mot(s)-clé(s) en anglais :
Long reads
RNA-sequencing
Nanopore
Error correction
Benchmark
RNA-sequencing
Nanopore
Error correction
Benchmark
Discipline(s) HAL :
Informatique [cs]/Bio-informatique [q-bio.QM]
Résumé en anglais : [en]
Motivation: Nanopore long-read sequencing technology offers promising alternatives to high-throughput short read sequencing, especially in the context of RNA-sequencing. However this technology is currently hindered by ...
Lire la suite >Motivation: Nanopore long-read sequencing technology offers promising alternatives to high-throughput short read sequencing, especially in the context of RNA-sequencing. However this technology is currently hindered by high error rates in the output data that affect analyses such as the identification of isoforms, exon boundaries, open reading frames, and the creation of gene catalogues. Due to the novelty of such data, computational methods are still actively being developed and options for the error-correction of Nanopore RNA-sequencing long reads remain limited. Results: In this article, we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting cDNA Nanopore reads. We provide an automatic and extensive benchmark tool that not only reports classical error-correction metrics but also the effect of correction on gene families, isoform diversity, bias towards the major isoform, and splice site detection. We find that long read error-correction tools that were originally developed for DNA are also suitable for the correction of Nanopore RNA-sequencing data, especially in terms of increasing base-pair accuracy. Yet investigators should be warned that the correction process perturbs gene family sizes and isoform diversity. This work provides guidelines on which (or whether) error-correction tools should be used, depending on the application type. Benchmarking software: https://gitlab.com/leoisl/LR_EC_analyser Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.Lire moins >
Lire la suite >Motivation: Nanopore long-read sequencing technology offers promising alternatives to high-throughput short read sequencing, especially in the context of RNA-sequencing. However this technology is currently hindered by high error rates in the output data that affect analyses such as the identification of isoforms, exon boundaries, open reading frames, and the creation of gene catalogues. Due to the novelty of such data, computational methods are still actively being developed and options for the error-correction of Nanopore RNA-sequencing long reads remain limited. Results: In this article, we evaluate the extent to which existing long-read DNA error correction methods are capable of correcting cDNA Nanopore reads. We provide an automatic and extensive benchmark tool that not only reports classical error-correction metrics but also the effect of correction on gene families, isoform diversity, bias towards the major isoform, and splice site detection. We find that long read error-correction tools that were originally developed for DNA are also suitable for the correction of Nanopore RNA-sequencing data, especially in terms of increasing base-pair accuracy. Yet investigators should be warned that the correction process perturbs gene family sizes and isoform diversity. This work provides guidelines on which (or whether) error-correction tools should be used, depending on the application type. Benchmarking software: https://gitlab.com/leoisl/LR_EC_analyser Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.Lire moins >
Langue :
Anglais
Vulgarisation :
Non
Projet ANR :
PaRis Artificial Intelligence Research InstitutE
Institut Convergences pour l'étude de l'Emergence des Pathologies au Travers des Individus et des populatiONs
Signatures transcriptionnelles pour une analyse RNA-seq globale
Algorithmes et outils logiciels pour le séquençage d'ARN de troisième génération
Institut Convergences pour l'étude de l'Emergence des Pathologies au Travers des Individus et des populatiONs
Signatures transcriptionnelles pour une analyse RNA-seq globale
Algorithmes et outils logiciels pour le séquençage d'ARN de troisième génération
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-02394395/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-02394395/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-02394395/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- LR_EC_analyser_paper.pdf
- Accès libre
- Accéder au document
- LR_EC_analyser_paper.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- LR_EC_analyser_paper.pdf
- Accès libre
- Accéder au document