Alignment-free detection and seed-based ...
Document type :
Pré-publication ou Document de travail
Title :
Alignment-free detection and seed-based identification of multi-loci V(D)J recombinations in Vidjil-algo
Author(s) :
Borée, Cyprien [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Giraud, Mathieu [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Salson, Mikael [Auteur]
Bioinformatics and Sequence Analysis [BONSAI]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Giraud, Mathieu [Auteur]

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Salson, Mikael [Auteur]

Bioinformatics and Sequence Analysis [BONSAI]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
English keyword(s) :
Spaced seeds
Aho-Corasick Automaton
Alignment-free algorithm
Immune repertoire
VDJ recombinations
Adaptive immune receptor repertoire
Aho-Corasick Automaton
Alignment-free algorithm
Immune repertoire
VDJ recombinations
Adaptive immune receptor repertoire
HAL domain(s) :
Informatique [cs]/Bio-informatique [q-bio.QM]
English abstract : [en]
The diversity of the immune repertoire is grounded on V(D)J recombinations in several loci. Many algorithms and software detect and designate these recombinations in high-throughput sequencing data. To improve their ...
Show more >The diversity of the immune repertoire is grounded on V(D)J recombinations in several loci. Many algorithms and software detect and designate these recombinations in high-throughput sequencing data. To improve their efficiency, we propose a multi-loci seed identification through an Aho-Corasick like automaton as well as a seed-based gene filtration. These algorithms were implemented into Vidjil-algo, used routinely by several labs for the analysis of hematologic malignancies.We benchmark the results of Vidjil-algo and of MiXCR on five datasets, evaluating the specificity and sensitivity of the detection, as well as the adequation of the designation to manually curated sequences. Compared to the previous algorithms, the new algorithms implemented in Vidjil-algo bring speedups between 3× and 30×, with a smaller memory footprint and without quality loss in results. They enable to precisely annotate in a few minutes millions of sequences coming from V(D)J recombinations, including incomplete V(D)J-like recombinations, improving our knowledge on immune repertoires.Show less >
Show more >The diversity of the immune repertoire is grounded on V(D)J recombinations in several loci. Many algorithms and software detect and designate these recombinations in high-throughput sequencing data. To improve their efficiency, we propose a multi-loci seed identification through an Aho-Corasick like automaton as well as a seed-based gene filtration. These algorithms were implemented into Vidjil-algo, used routinely by several labs for the analysis of hematologic malignancies.We benchmark the results of Vidjil-algo and of MiXCR on five datasets, evaluating the specificity and sensitivity of the detection, as well as the adequation of the designation to manually curated sequences. Compared to the previous algorithms, the new algorithms implemented in Vidjil-algo bring speedups between 3× and 30×, with a smaller memory footprint and without quality loss in results. They enable to precisely annotate in a few minutes millions of sequences coming from V(D)J recombinations, including incomplete V(D)J-like recombinations, improving our knowledge on immune repertoires.Show less >
Language :
Anglais
Collections :
Source :
Files
- document
- Open access
- Access the document
- vidjil-algo.pdf
- Open access
- Access the document
- vidjil-algo-supp.pdf
- Open access
- Access the document