A binned technique for scalable model-based ...
Document type :
Partie d'ouvrage
Title :
A binned technique for scalable model-based clustering on huge datasets
Author(s) :
Antonazzo, Filippo [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Biernacki, Christophe [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Keribin, Christine [Auteur]
Statistique mathématique et apprentissage [CELESTE]
MOdel for Data Analysis and Learning [MODAL]
Biernacki, Christophe [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Keribin, Christine [Auteur]
Statistique mathématique et apprentissage [CELESTE]
Book title :
Book of Short Papers of the 5th international workshop on Models and Learning for Clustering and Classification MBC2 2020, Catania, Italy
Publication date :
2021-10-26
English keyword(s) :
Clustering
binned data
big data
green computing
Big Data
clustering
binned data
big data
green computing
Big Data
clustering
HAL domain(s) :
Statistiques [stat]
English abstract : [en]
Clustering is impacted by the regular increase of sample sizes which provides opportunity to reveal information previously out of scope. However, the volume of data leads to some issues related to the need of many computational ...
Show more >Clustering is impacted by the regular increase of sample sizes which provides opportunity to reveal information previously out of scope. However, the volume of data leads to some issues related to the need of many computational resources and also to high energy consumption. Resorting to binned data depending on an adaptive grid is expected to give proper answer to such green computing issues while not harming the quality of the related estimation. After a brief review of existing methods, a first application in the context of univariate model-based clustering is provided, with a numerical illustration of its advantages. Finally, an initial formalization of the multivariate extension is done, highlighting both issues and possible strategies.Show less >
Show more >Clustering is impacted by the regular increase of sample sizes which provides opportunity to reveal information previously out of scope. However, the volume of data leads to some issues related to the need of many computational resources and also to high energy consumption. Resorting to binned data depending on an adaptive grid is expected to give proper answer to such green computing issues while not harming the quality of the related estimation. After a brief review of existing methods, a first application in the context of univariate model-based clustering is provided, with a numerical illustration of its advantages. Finally, an initial formalization of the multivariate extension is done, highlighting both issues and possible strategies.Show less >
Language :
Anglais
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- document
- Open access
- Access the document
- Short_paper_6.pdf
- Open access
- Access the document