Iterative Spaced Seed Hashing: Closing the ...
Document type :
Communication dans un congrès avec actes
Title :
Iterative Spaced Seed Hashing: Closing the Gap Between Spaced Seed Hashing and k-mer Hashing
Author(s) :
Petrucci, Enrico [Auteur]
Department of Information Engineering [Padova] [DEI]
Noé, Laurent [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille
Université de Lille, Sciences et Technologies
Pizzi, Cinzia [Auteur]
Department of Information Engineering [Padova] [DEI]
Comin, Matteo [Auteur]
Department of Information Engineering [Padova] [DEI]
Department of Information Engineering [Padova] [DEI]
Noé, Laurent [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille
Université de Lille, Sciences et Technologies
Pizzi, Cinzia [Auteur]
Department of Information Engineering [Padova] [DEI]
Comin, Matteo [Auteur]
Department of Information Engineering [Padova] [DEI]
Scientific editor(s) :
Lecture Notes in Computer Science
Conference title :
15th International Symposium on Bioinformatics Research and Applications (ISBRA)
City :
Barcelona
Country :
Espagne
Start date of the conference :
2019-06-03
Book title :
Lecture Notes in Computer Science
Publication date :
2019-05-09
English keyword(s) :
Efficient hashing
Gapped q-gram
spaced seeds
k-mers
Gapped q-gram
spaced seeds
k-mers
HAL domain(s) :
Informatique [cs]/Bio-informatique [q-bio.QM]
English abstract : [en]
Alignment-free classification of sequences has enabled high-throughput processing of sequencing data in many bioinformatics pipelines. Much work has been done to speed-up the indexing of k-mers through hash-table and other ...
Show more >Alignment-free classification of sequences has enabled high-throughput processing of sequencing data in many bioinformatics pipelines. Much work has been done to speed-up the indexing of k-mers through hash-table and other data structures. These efforts have led to very fast indexes, but because they are k-mer based, they often lack sensitivity due to sequencing errors or polymorphisms. Spaced seeds are a special type of pattern that accounts for errors or mutations. They allow to improve the sensitivity and they are now routinely used instead of k-mers in many applications. The major drawback of spaced seeds is that they cannot be efficiently hashed and thus their usage increases substantially the computational time. In this paper we address the problem of efficient spaced seed hashing. We propose an iterative algorithm that combines multiple spaced seed hashes by exploiting the similarity of adjacent hash values in order to efficiently compute the next hash. We report a series of experiments on HTS reads hashing, with several spaced seeds. Our algorithm can compute the hashing values of spaced seeds with a speedup of 6.2x, outperforming previous methods. Software and Datasets are available at ISSHShow less >
Show more >Alignment-free classification of sequences has enabled high-throughput processing of sequencing data in many bioinformatics pipelines. Much work has been done to speed-up the indexing of k-mers through hash-table and other data structures. These efforts have led to very fast indexes, but because they are k-mer based, they often lack sensitivity due to sequencing errors or polymorphisms. Spaced seeds are a special type of pattern that accounts for errors or mutations. They allow to improve the sensitivity and they are now routinely used instead of k-mers in many applications. The major drawback of spaced seeds is that they cannot be efficiently hashed and thus their usage increases substantially the computational time. In this paper we address the problem of efficient spaced seed hashing. We propose an iterative algorithm that combines multiple spaced seed hashes by exploiting the similarity of adjacent hash values in order to efficiently compute the next hash. We report a series of experiments on HTS reads hashing, with several spaced seeds. Our algorithm can compute the hashing values of spaced seeds with a speedup of 6.2x, outperforming previous methods. Software and Datasets are available at ISSHShow less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.archives-ouvertes.fr/hal-02146404/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-02146404/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-02146404/document
- Open access
- Access the document
- document
- Open access
- Access the document
- ISSH_Camera.pdf
- Open access
- Access the document
- ISSH_Camera.pdf
- Open access
- Access the document