Unifying Data Units and Models in (Co-)Clustering
Document type :
Article dans une revue scientifique
Permalink :
Title :
Unifying Data Units and Models in (Co-)Clustering
Author(s) :
Journal title :
Advances in Data Analysis and Classification
Volume number :
12
Publisher :
Springer Verlag
Publication date :
2018-05-25
ISSN :
1862-5347
Keyword(s) :
Measurement units
Mixed data
Mixture models
Model selection
Non-identifiability
Mixed data
Mixture models
Model selection
Non-identifiability
HAL domain(s) :
Statistiques [stat]/Méthodologie [stat.ME]
English abstract : [en]
Statisticians are already aware that any modelling process issue (exploration, prediction) is wholly data unit dependent, to the extend that it should be impossible to provide a statistical outcome without specifying the ...
Show more >Statisticians are already aware that any modelling process issue (exploration, prediction) is wholly data unit dependent, to the extend that it should be impossible to provide a statistical outcome without specifying the couple (unit,model). In this work, this general principle is formalized with a particular focus in model-based clustering and co-clustering in the case of possibly mixed data types (continuous and\/or categorical and\/or counting features), being also the opportunity to revisit what the related data units are. Such a formalization allows to raise three important spots: (i) the couple (unit,model) is not identifiable so that different interpretations unit\/model of the same whole modelling process are always possible; (ii) combining different " classical " units with different " classical " models should be an interesting opportunity for a cheap, wide and meaningful enlarging of the whole modelling process family designed by the couple (unit,model); (iii) if necessary, this couple , up to the non identifiability property, could be selected by any traditional model selection criterion. Some experiments on real data sets illustrate in detail practical benefits from the previous three spots.Show less >
Show more >Statisticians are already aware that any modelling process issue (exploration, prediction) is wholly data unit dependent, to the extend that it should be impossible to provide a statistical outcome without specifying the couple (unit,model). In this work, this general principle is formalized with a particular focus in model-based clustering and co-clustering in the case of possibly mixed data types (continuous and\/or categorical and\/or counting features), being also the opportunity to revisit what the related data units are. Such a formalization allows to raise three important spots: (i) the couple (unit,model) is not identifiable so that different interpretations unit\/model of the same whole modelling process are always possible; (ii) combining different " classical " units with different " classical " models should be an interesting opportunity for a cheap, wide and meaningful enlarging of the whole modelling process family designed by the couple (unit,model); (iii) if necessary, this couple , up to the non identifiability property, could be selected by any traditional model selection criterion. Some experiments on real data sets illustrate in detail practical benefits from the previous three spots.Show less >
Language :
Anglais
Audience :
Internationale
Popular science :
Non
Submission date :
2020-06-08T14:11:13Z
2020-06-09T09:23:24Z
2020-06-09T09:23:24Z
Files
- documen
- Open access
- Access the document