A Scalable Indexing Solution to Mine Huge ...
Document type :
Compte-rendu et recension critique d'ouvrage
Title :
A Scalable Indexing Solution to Mine Huge Genomic Sequence Collections
Author(s) :
Rivals, Eric [Auteur correspondant]
Méthodes et Algorithmes pour la Bioinformatique [MAB]
Philippe, Nicolas [Auteur]
Méthodes et Algorithmes pour la Bioinformatique [MAB]
Salson, Mikael [Auteur]
Bioinformatics and Sequence Analysis [BONSAI]
Léonard, Martine [Auteur]
Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes [LITIS]
Commes, Thérèse [Auteur]
Centre de recherche en Biologie Cellulaire [CRBM]
Lecroq, Thierry [Auteur]
Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes [LITIS]
Méthodes et Algorithmes pour la Bioinformatique [MAB]
Philippe, Nicolas [Auteur]
Méthodes et Algorithmes pour la Bioinformatique [MAB]
Salson, Mikael [Auteur]

Bioinformatics and Sequence Analysis [BONSAI]
Léonard, Martine [Auteur]
Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes [LITIS]
Commes, Thérèse [Auteur]
Centre de recherche en Biologie Cellulaire [CRBM]
Lecroq, Thierry [Auteur]
Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes [LITIS]
Journal title :
ERCIM News
Pages :
20-21
Publisher :
ERCIM
Publication date :
2012-04-02
ISSN :
0926-4981
HAL domain(s) :
Informatique [cs]/Bio-informatique [q-bio.QM]
Sciences du Vivant [q-bio]/Bio-Informatique, Biologie Systémique [q-bio.QM]
Sciences du Vivant [q-bio]/Bio-Informatique, Biologie Systémique [q-bio.QM]
English abstract : [en]
With High Throughput Sequencing (HTS) technologies, biology is experiencing a sequence data deluge. A single sequencing experiment currently yields 100 million short sequences, or reads, the analysis of which demands ...
Show more >With High Throughput Sequencing (HTS) technologies, biology is experiencing a sequence data deluge. A single sequencing experiment currently yields 100 million short sequences, or reads, the analysis of which demands efficient and scalable sequence analysis algorithms. Diverse kinds of applications repeatedly need to query the sequence collection for the occurrence positions of a subword. Time can be saved by building an index of all subwords present in the sequences before performing huge numbers of queries. However, both the scalability and the memory requirement of the chosen data structure must suit the data volume. Here, we introduce a novel indexing data structure, called Gk arrays, and related algorithms that improve on classical indexes and state of the art hash tables.Show less >
Show more >With High Throughput Sequencing (HTS) technologies, biology is experiencing a sequence data deluge. A single sequencing experiment currently yields 100 million short sequences, or reads, the analysis of which demands efficient and scalable sequence analysis algorithms. Diverse kinds of applications repeatedly need to query the sequence collection for the occurrence positions of a subword. Time can be saved by building an index of all subwords present in the sequences before performing huge numbers of queries. However, both the scalability and the memory requirement of the chosen data structure must suit the data volume. Here, we introduce a novel indexing data structure, called Gk arrays, and related algorithms that improve on classical indexes and state of the art hash tables.Show less >
Language :
Anglais
Popular science :
Non
Collections :
Source :
Files
- https://hal-lirmm.ccsd.cnrs.fr/lirmm-00712653/document
- Open access
- Access the document
- https://hal-lirmm.ccsd.cnrs.fr/lirmm-00712653/document
- Open access
- Access the document
- https://hal-lirmm.ccsd.cnrs.fr/lirmm-00712653/document
- Open access
- Access the document
- document
- Open access
- Access the document
- Rivals-etal-ERCIM-News-89-5p.pdf
- Open access
- Access the document