Multi-Partitions Subspace Clustering
Document type :
Article dans une revue scientifique: Article original
DOI :
Permalink :
Title :
Multi-Partitions Subspace Clustering
Author(s) :
Vandewalle, Vincent [Auteur]
METRICS : Evaluation des technologies de santé et des pratiques médicales - ULR 2694
MOdel for Data Analysis and Learning [MODAL]

METRICS : Evaluation des technologies de santé et des pratiques médicales - ULR 2694
MOdel for Data Analysis and Learning [MODAL]
Journal title :
Mathematics
Abbreviated title :
Mathematics
Volume number :
8
Publication date :
2020-06-06
ISSN :
2227-7390
English keyword(s) :
clustering
mixture model
factorial discriminant analysis
EM algorithm
mixture model
factorial discriminant analysis
EM algorithm
HAL domain(s) :
Sciences du Vivant [q-bio]
English abstract : [en]
In model based clustering, it is often supposed that only one clustering latent variable explains the heterogeneity of the whole dataset. However, in many cases several latent variables could explain the heterogeneity of ...
Show more >In model based clustering, it is often supposed that only one clustering latent variable explains the heterogeneity of the whole dataset. However, in many cases several latent variables could explain the heterogeneity of the data at hand. Finding such class variables could result in a richer interpretation of the data. In the continuous data setting, a multi-partition model based clustering is proposed. It assumes the existence of several latent clustering variables, each one explaining the heterogeneity of the data with respect to some clustering subspace. It allows to simultaneously find the multi-partitions and the related subspaces. Parameters of the model are estimated through an EM algorithm relying on a probabilistic reinterpretation of the factorial discriminant analysis. A model choice strategy relying on the BIC criterion is proposed to select to number of subspaces and the number of clusters by subspace. The obtained results are thus several projections of the data, each one conveying its own clustering of the data. Model’s behavior is illustrated on simulated and real data.Show less >
Show more >In model based clustering, it is often supposed that only one clustering latent variable explains the heterogeneity of the whole dataset. However, in many cases several latent variables could explain the heterogeneity of the data at hand. Finding such class variables could result in a richer interpretation of the data. In the continuous data setting, a multi-partition model based clustering is proposed. It assumes the existence of several latent clustering variables, each one explaining the heterogeneity of the data with respect to some clustering subspace. It allows to simultaneously find the multi-partitions and the related subspaces. Parameters of the model are estimated through an EM algorithm relying on a probabilistic reinterpretation of the factorial discriminant analysis. A model choice strategy relying on the BIC criterion is proposed to select to number of subspaces and the number of clusters by subspace. The obtained results are thus several projections of the data, each one conveying its own clustering of the data. Model’s behavior is illustrated on simulated and real data.Show less >
Language :
Anglais
Audience :
Internationale
Popular science :
Non
Administrative institution(s) :
Université de Lille
CHU Lille
CHU Lille
Submission date :
2023-11-15T09:42:31Z
2023-12-13T13:55:00Z
2023-12-13T13:55:00Z
Files
- mathematics-08-00597-v2.pdf
- Non spécifié
- Open access
- Access the document