REINDEER
Document type :
Autre communication scientifique (congrès sans actes - poster - séminaire...)
Title :
REINDEER
Author(s) :
Marchet, Camille [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
English keyword(s) :
transcriptomics
Data structure
minimal perfect
K-mer
Data structure
minimal perfect
K-mer
HAL domain(s) :
Informatique [cs]/Algorithme et structure de données [cs.DS]
Sciences du Vivant [q-bio]/Biochimie, Biologie Moléculaire/Génomique, Transcriptomique et Protéomique [q-bio.GN]
Sciences du Vivant [q-bio]/Biochimie, Biologie Moléculaire/Génomique, Transcriptomique et Protéomique [q-bio.GN]
English abstract : [en]
REINDEER builds a data-structure that indexes k-mers and their abundances in a collection of datasets (raw RNA-seq or metagenomic reads for instance). Then, a sequence (FASTA) can be queried for its presence and abundance ...
Show more >REINDEER builds a data-structure that indexes k-mers and their abundances in a collection of datasets (raw RNA-seq or metagenomic reads for instance). Then, a sequence (FASTA) can be queried for its presence and abundance in each indexed dataset. While other tools (e.g. SBT, BIGSI) were also designed for large-scale k-mer presence/absence queries, retrieving abundances was so far unsupported (except for single datasets, e.g. using some k-mer counters like KMC, Jellyfish). REINDEER combines fast queries, small index size, and low memory footprint during indexing and queries. We showed it allows to index 2585 RNA-seq datasets (~4 billions k-mers) using less than 60GB of RAM and a final index size lower than 60GB on the disk. Then, a REINDEER index can either be queried on disk (experimental feature, low RAM usage) or be loaded in RAM for faster queries.Show less >
Show more >REINDEER builds a data-structure that indexes k-mers and their abundances in a collection of datasets (raw RNA-seq or metagenomic reads for instance). Then, a sequence (FASTA) can be queried for its presence and abundance in each indexed dataset. While other tools (e.g. SBT, BIGSI) were also designed for large-scale k-mer presence/absence queries, retrieving abundances was so far unsupported (except for single datasets, e.g. using some k-mer counters like KMC, Jellyfish). REINDEER combines fast queries, small index size, and low memory footprint during indexing and queries. We showed it allows to index 2585 RNA-seq datasets (~4 billions k-mers) using less than 60GB of RAM and a final index size lower than 60GB on the disk. Then, a REINDEER index can either be queried on disk (experimental feature, low RAM usage) or be loaded in RAM for faster queries.Show less >
Language :
Anglais
Collections :
Source :
Files
- document
- Open access
- Access the document
- REINDEER-master%281%29.zip
- Open access
- Access the document