A binned technique for scalable model-based ...
Type de document :
Partie d'ouvrage
Titre :
A binned technique for scalable model-based clustering on huge datasets
Auteur(s) :
Antonazzo, Filippo [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Biernacki, Christophe [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Keribin, Christine [Auteur]
Statistique mathématique et apprentissage [CELESTE]
MOdel for Data Analysis and Learning [MODAL]
Biernacki, Christophe [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Keribin, Christine [Auteur]
Statistique mathématique et apprentissage [CELESTE]
Titre de l’ouvrage :
Book of Short Papers of the 5th international workshop on Models and Learning for Clustering and Classification MBC2 2020, Catania, Italy
Date de publication :
2021-10-26
Mot(s)-clé(s) en anglais :
Clustering
binned data
big data
green computing
Big Data
clustering
binned data
big data
green computing
Big Data
clustering
Discipline(s) HAL :
Statistiques [stat]
Résumé en anglais : [en]
Clustering is impacted by the regular increase of sample sizes which provides opportunity to reveal information previously out of scope. However, the volume of data leads to some issues related to the need of many computational ...
Lire la suite >Clustering is impacted by the regular increase of sample sizes which provides opportunity to reveal information previously out of scope. However, the volume of data leads to some issues related to the need of many computational resources and also to high energy consumption. Resorting to binned data depending on an adaptive grid is expected to give proper answer to such green computing issues while not harming the quality of the related estimation. After a brief review of existing methods, a first application in the context of univariate model-based clustering is provided, with a numerical illustration of its advantages. Finally, an initial formalization of the multivariate extension is done, highlighting both issues and possible strategies.Lire moins >
Lire la suite >Clustering is impacted by the regular increase of sample sizes which provides opportunity to reveal information previously out of scope. However, the volume of data leads to some issues related to the need of many computational resources and also to high energy consumption. Resorting to binned data depending on an adaptive grid is expected to give proper answer to such green computing issues while not harming the quality of the related estimation. After a brief review of existing methods, a first application in the context of univariate model-based clustering is provided, with a numerical illustration of its advantages. Finally, an initial formalization of the multivariate extension is done, highlighting both issues and possible strategies.Lire moins >
Langue :
Anglais
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- document
- Accès libre
- Accéder au document
- Short_paper_6.pdf
- Accès libre
- Accéder au document