REINDEER
Type de document :
Autre communication scientifique (congrès sans actes - poster - séminaire...)
Titre :
REINDEER
Auteur(s) :
Marchet, Camille [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Mot(s)-clé(s) en anglais :
transcriptomics
Data structure
minimal perfect
K-mer
Data structure
minimal perfect
K-mer
Discipline(s) HAL :
Informatique [cs]/Algorithme et structure de données [cs.DS]
Sciences du Vivant [q-bio]/Biochimie, Biologie Moléculaire/Génomique, Transcriptomique et Protéomique [q-bio.GN]
Sciences du Vivant [q-bio]/Biochimie, Biologie Moléculaire/Génomique, Transcriptomique et Protéomique [q-bio.GN]
Résumé en anglais : [en]
REINDEER builds a data-structure that indexes k-mers and their abundances in a collection of datasets (raw RNA-seq or metagenomic reads for instance). Then, a sequence (FASTA) can be queried for its presence and abundance ...
Lire la suite >REINDEER builds a data-structure that indexes k-mers and their abundances in a collection of datasets (raw RNA-seq or metagenomic reads for instance). Then, a sequence (FASTA) can be queried for its presence and abundance in each indexed dataset. While other tools (e.g. SBT, BIGSI) were also designed for large-scale k-mer presence/absence queries, retrieving abundances was so far unsupported (except for single datasets, e.g. using some k-mer counters like KMC, Jellyfish). REINDEER combines fast queries, small index size, and low memory footprint during indexing and queries. We showed it allows to index 2585 RNA-seq datasets (~4 billions k-mers) using less than 60GB of RAM and a final index size lower than 60GB on the disk. Then, a REINDEER index can either be queried on disk (experimental feature, low RAM usage) or be loaded in RAM for faster queries.Lire moins >
Lire la suite >REINDEER builds a data-structure that indexes k-mers and their abundances in a collection of datasets (raw RNA-seq or metagenomic reads for instance). Then, a sequence (FASTA) can be queried for its presence and abundance in each indexed dataset. While other tools (e.g. SBT, BIGSI) were also designed for large-scale k-mer presence/absence queries, retrieving abundances was so far unsupported (except for single datasets, e.g. using some k-mer counters like KMC, Jellyfish). REINDEER combines fast queries, small index size, and low memory footprint during indexing and queries. We showed it allows to index 2585 RNA-seq datasets (~4 billions k-mers) using less than 60GB of RAM and a final index size lower than 60GB on the disk. Then, a REINDEER index can either be queried on disk (experimental feature, low RAM usage) or be loaded in RAM for faster queries.Lire moins >
Langue :
Anglais
Collections :
Source :
Fichiers
- document
- Accès libre
- Accéder au document
- REINDEER-master%281%29.zip
- Accès libre
- Accéder au document