Indexing labeled sequences
Type de document :
Compte-rendu et recension critique d'ouvrage
DOI :
Titre :
Indexing labeled sequences
Auteur(s) :
Rocher, Tatiana [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Bioinformatics and Sequence Analysis [BONSAI]
Giraud, Mathieu [Auteur correspondant]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Bioinformatics and Sequence Analysis [BONSAI]
Salson, Mikael [Auteur correspondant]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Bioinformatics and Sequence Analysis [BONSAI]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Bioinformatics and Sequence Analysis [BONSAI]
Giraud, Mathieu [Auteur correspondant]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Bioinformatics and Sequence Analysis [BONSAI]
Salson, Mikael [Auteur correspondant]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Bioinformatics and Sequence Analysis [BONSAI]
Titre de la revue :
PeerJ Computer Science
Pagination :
1-14
Éditeur :
PeerJ
Date de publication :
2018
ISSN :
2376-5992
Mot(s)-clé(s) en anglais :
Data structures
Text indexing
Burrows–Wheeler transform
Wavelet Tree
V(D)J recombination
Text indexing
Burrows–Wheeler transform
Wavelet Tree
V(D)J recombination
Discipline(s) HAL :
Informatique [cs]/Bio-informatique [q-bio.QM]
Informatique [cs]/Complexité [cs.CC]
Informatique [cs]/Algorithme et structure de données [cs.DS]
Informatique [cs]/Complexité [cs.CC]
Informatique [cs]/Algorithme et structure de données [cs.DS]
Résumé en anglais : [en]
Background: Labels are a way to add some information on a text, such as functional annotations such as genes on a DNA sequences. V(D)J recombinations are DNA recombinations involving two or three short genes in lymphocytes. ...
Lire la suite >Background: Labels are a way to add some information on a text, such as functional annotations such as genes on a DNA sequences. V(D)J recombinations are DNA recombinations involving two or three short genes in lymphocytes. Sequencing this short region (500 bp or less) produces labeled sequences and brings insight in the lymphocyte repertoire for onco-hematology or immunology studies. Methods: We present two indexes for a text with non-overlapping labels. They store the text in a Burrows–Wheeler transform (BWT) and a compressed label sequence in a Wavelet Tree. The label sequence is taken in the order of the text (TL-index) or in the order of the BWT (TL-BW-index). Both indexes need a space related to the entropy of the labeled text. Results: These indexes allow efficient text–label queries to count and find labeled patterns. The TL-BW-index has an overhead on simple label queries but is very efficient on combined pattern–label queries. We implemented the indexes in C++ and compared them against a baseline solution on pseudo-random as well as on V(D)J labeled texts. Discussion: New indexes such as the ones we proposed improve the way we index and query labeled texts as, for instance, lymphocyte repertoire for hematological and immunological studies.Lire moins >
Lire la suite >Background: Labels are a way to add some information on a text, such as functional annotations such as genes on a DNA sequences. V(D)J recombinations are DNA recombinations involving two or three short genes in lymphocytes. Sequencing this short region (500 bp or less) produces labeled sequences and brings insight in the lymphocyte repertoire for onco-hematology or immunology studies. Methods: We present two indexes for a text with non-overlapping labels. They store the text in a Burrows–Wheeler transform (BWT) and a compressed label sequence in a Wavelet Tree. The label sequence is taken in the order of the text (TL-index) or in the order of the BWT (TL-BW-index). Both indexes need a space related to the entropy of the labeled text. Results: These indexes allow efficient text–label queries to count and find labeled patterns. The TL-BW-index has an overhead on simple label queries but is very efficient on combined pattern–label queries. We implemented the indexes in C++ and compared them against a baseline solution on pseudo-random as well as on V(D)J labeled texts. Discussion: New indexes such as the ones we proposed improve the way we index and query labeled texts as, for instance, lymphocyte repertoire for hematological and immunological studies.Lire moins >
Langue :
Anglais
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-01743104/document
- Accès libre
- Accéder au document
- https://peerj.com/articles/cs-148.pdf
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-01743104/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-01743104/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- 2018-Rocher-indexing-labeled-sequences.pdf
- Accès libre
- Accéder au document