Model-based clustering of Gaussian copulas ...
Document type :
Article dans une revue scientifique
Permalink :
Title :
Model-based clustering of Gaussian copulas for mixed data
Author(s) :
Marbac, Matthieu [Auteur]
Biernacki, Christophe [Auteur]
Vandewalle, Vincent [Auteur]
METRICS : Evaluation des technologies de santé et des pratiques médicales - ULR 2694
Evaluation des technologies de santé et des pratiques médicales - ULR 2694 [METRICS]
Biernacki, Christophe [Auteur]

Vandewalle, Vincent [Auteur]

METRICS : Evaluation des technologies de santé et des pratiques médicales - ULR 2694
Evaluation des technologies de santé et des pratiques médicales - ULR 2694 [METRICS]
Journal title :
Communications in Statistics - Theory and Methods
Volume number :
46
Pages :
11635-11656
Publisher :
Taylor & Francis
Publication date :
2017
ISSN :
0361-0926
Keyword(s) :
Mixture models
Mixed data
Clustering
Gaussian copula
Metropolis-within-Gibbs algorithm
Visualization
Mixed data
Clustering
Gaussian copula
Metropolis-within-Gibbs algorithm
Visualization
HAL domain(s) :
Statistiques [stat]/Méthodologie [stat.ME]
English abstract : [en]
Clustering task of mixed data is a challenging problem. In a probabilistic framework, the main difficulty is due to a shortage of conventional distributions for such data. In this paper, we propose to achieve the mixed ...
Show more >Clustering task of mixed data is a challenging problem. In a probabilistic framework, the main difficulty is due to a shortage of conventional distributions for such data. In this paper, we propose to achieve the mixed data clustering with a Gaussian copula mixture model, since copulas, and in particular the Gaussian ones, are powerful tools for easily modelling the distribution of multivariate variables. Indeed, considering a mixing of continuous, integer and ordinal variables (thus all having a cumulative distribution function), this copula mixture model defines intra-component dependencies similar to a Gaussian mixture, so with classical correlation meaning. Simultaneously, it preserves standard margins associated to continuous, integer and ordered features, namely the Gaussian, the Poisson and the ordered multinomial distributions. As an interesting by-product, the proposed mixture model generalizes many well-known ones and also provides tools of visualization based on the parameters. At a practical level, the Bayesian inference is retained and it is achieved with a Metropolis-within-Gibbs sampler. Experiments on simulated and real data sets finally illustrate the expected advantages of the proposed model for mixed data: flexible and meaningful parametrization combined with visualization features.Show less >
Show more >Clustering task of mixed data is a challenging problem. In a probabilistic framework, the main difficulty is due to a shortage of conventional distributions for such data. In this paper, we propose to achieve the mixed data clustering with a Gaussian copula mixture model, since copulas, and in particular the Gaussian ones, are powerful tools for easily modelling the distribution of multivariate variables. Indeed, considering a mixing of continuous, integer and ordinal variables (thus all having a cumulative distribution function), this copula mixture model defines intra-component dependencies similar to a Gaussian mixture, so with classical correlation meaning. Simultaneously, it preserves standard margins associated to continuous, integer and ordered features, namely the Gaussian, the Poisson and the ordered multinomial distributions. As an interesting by-product, the proposed mixture model generalizes many well-known ones and also provides tools of visualization based on the parameters. At a practical level, the Bayesian inference is retained and it is achieved with a Metropolis-within-Gibbs sampler. Experiments on simulated and real data sets finally illustrate the expected advantages of the proposed model for mixed data: flexible and meaningful parametrization combined with visualization features.Show less >
Language :
Anglais
Audience :
Internationale
Popular science :
Non
Administrative institution(s) :
CHU Lille
Université de Lille
Université de Lille
Submission date :
2020-06-08T14:11:37Z
2020-06-09T09:32:20Z
2020-06-09T09:32:20Z
Files
- documen
- Open access
- Access the document