supersamplerFractionnal hitting set ...
Document type :
Autre communication scientifique (congrès sans actes - poster - séminaire...)
Title :
supersamplerFractionnal hitting set implementation for lightweight genomic data sketching
Author(s) :
Limasset, Antoine [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Centre National de la Recherche Scientifique [CNRS]
Rouzé, Timothé [Auteur]
Martayan, Igor [Auteur]
Marchet, Camille [Auteur]

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Centre National de la Recherche Scientifique [CNRS]
Rouzé, Timothé [Auteur]
Martayan, Igor [Auteur]
Marchet, Camille [Auteur]
English keyword(s) :
minimizers
indexing
kmers
indexing
kmers
HAL domain(s) :
Informatique [cs]/Bio-informatique [q-bio.QM]
English abstract : [en]
Bird-eye viewSuperSampler (SPSP) is an implementation for a novel k-mer selection scheme we called Fractional Hitting Sets (FHS) which is a generalisation of Universal Hitting Sets (UHS). It allows to quickly create sketches ...
Show more >Bird-eye viewSuperSampler (SPSP) is an implementation for a novel k-mer selection scheme we called Fractional Hitting Sets (FHS) which is a generalisation of Universal Hitting Sets (UHS). It allows to quickly create sketches of genomes/ metagenomes and to compare such sketches to obtain Containment or Jaccard indices of the input data.SuperSampler uses super-k-mers instead of k-mers which allows for lighter sketches, less RAM usage and less computational time when performing comparison than traditional subsampling methods. Thanks to a clever sketch organisation allowed by the super-k-mers structure.Sketch creation is an application of FracMinHash on the selection of minimizers (a m-mer of a k-mer which hash value is minimal). When a minimizer is selected, every k-mer around it which shares the same minimizer is selected and will form a super-k-mer.Show less >
Show more >Bird-eye viewSuperSampler (SPSP) is an implementation for a novel k-mer selection scheme we called Fractional Hitting Sets (FHS) which is a generalisation of Universal Hitting Sets (UHS). It allows to quickly create sketches of genomes/ metagenomes and to compare such sketches to obtain Containment or Jaccard indices of the input data.SuperSampler uses super-k-mers instead of k-mers which allows for lighter sketches, less RAM usage and less computational time when performing comparison than traditional subsampling methods. Thanks to a clever sketch organisation allowed by the super-k-mers structure.Sketch creation is an application of FracMinHash on the selection of minimizers (a m-mer of a k-mer which hash value is minimal). When a minimizer is selected, every k-mer around it which shares the same minimizer is selected and will form a super-k-mer.Show less >
Language :
Anglais
ANR Project :
Collections :
Source :
Files
- document
- Open access
- Access the document
- supersampler-main.zip
- Open access
- Access the document