Unifying Data Units and Models in (Co-)Clustering
Document type :
Compte-rendu et recension critique d'ouvrage
Title :
Unifying Data Units and Models in (Co-)Clustering
Author(s) :
Biernacki, Christophe [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Lourme, Alexandre [Auteur]
MOdel for Data Analysis and Learning [MODAL]
MOdel for Data Analysis and Learning [MODAL]
Lourme, Alexandre [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Journal title :
Advances in Data Analysis and Classification
Pages :
7-31
Publisher :
Springer Verlag
Publication date :
2018-05-25
ISSN :
1862-5347
English keyword(s) :
Mixed data
Mixture models
Model selection
Non-identifiability
Measurement units
Mixture models
Model selection
Non-identifiability
Measurement units
HAL domain(s) :
Statistiques [stat]/Méthodologie [stat.ME]
English abstract : [en]
Statisticians are already aware that any modelling process issue (exploration, prediction) is wholly data unit dependent, to the extend that it should be impossible to provide a statistical outcome without specifying the ...
Show more >Statisticians are already aware that any modelling process issue (exploration, prediction) is wholly data unit dependent, to the extend that it should be impossible to provide a statistical outcome without specifying the couple (unit,model). In this work, this general principle is formalized with a particular focus in model-based clustering and co-clustering in the case of possibly mixed data types (continuous and/or categorical and/or counting features), being also the opportunity to revisit what the related data units are. Such a formalization allows to raise three important spots: (i) the couple (unit,model) is not identifiable so that different interpretations unit/model of the same whole modelling process are always possible; (ii) combining different " classical " units with different " classical " models should be an interesting opportunity for a cheap, wide and meaningful enlarging of the whole modelling process family designed by the couple (unit,model); (iii) if necessary, this couple , up to the non identifiability property, could be selected by any traditional model selection criterion. Some experiments on real data sets illustrate in detail practical benefits from the previous three spots.Show less >
Show more >Statisticians are already aware that any modelling process issue (exploration, prediction) is wholly data unit dependent, to the extend that it should be impossible to provide a statistical outcome without specifying the couple (unit,model). In this work, this general principle is formalized with a particular focus in model-based clustering and co-clustering in the case of possibly mixed data types (continuous and/or categorical and/or counting features), being also the opportunity to revisit what the related data units are. Such a formalization allows to raise three important spots: (i) the couple (unit,model) is not identifiable so that different interpretations unit/model of the same whole modelling process are always possible; (ii) combining different " classical " units with different " classical " models should be an interesting opportunity for a cheap, wide and meaningful enlarging of the whole modelling process family designed by the couple (unit,model); (iii) if necessary, this couple , up to the non identifiability property, could be selected by any traditional model selection criterion. Some experiments on real data sets illustrate in detail practical benefits from the previous three spots.Show less >
Language :
Anglais
Popular science :
Non
Collections :
Source :
Files
- document
- Open access
- Access the document
- paper_units_biernacki_lourme.pdf
- Open access
- Access the document
- paper_units_biernacki_lourme.pdf
- Open access
- Access the document