Model-based co-clustering for mixed type data
Document type :
Article dans une revue scientifique: Article original
Title :
Model-based co-clustering for mixed type data
Author(s) :
Selosse, Margot [Auteur]
Université Lumière - Lyon 2 [UL2]
Entrepôts, Représentation et Ingénierie des Connaissances [ERIC]
Jacques, Julien [Auteur]
Université Lumière - Lyon 2 [UL2]
Entrepôts, Représentation et Ingénierie des Connaissances [ERIC]
Biernacki, Christophe [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Université Lumière - Lyon 2 [UL2]
Entrepôts, Représentation et Ingénierie des Connaissances [ERIC]
Jacques, Julien [Auteur]
Université Lumière - Lyon 2 [UL2]
Entrepôts, Représentation et Ingénierie des Connaissances [ERIC]
Biernacki, Christophe [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Journal title :
Computational Statistics and Data Analysis
Pages :
106866
Publisher :
Elsevier
Publication date :
2020
ISSN :
0167-9473
English keyword(s) :
co-clustering
mixed-type data
latent block model
latent block model * Corresponding author
mixed-type data
latent block model
latent block model * Corresponding author
HAL domain(s) :
Mathématiques [math]/Statistiques [math.ST]
English abstract : [en]
The importance of clustering for creating groups of observations is well known. The emergence of high-dimensional data sets with a huge number of features leads to co-clustering techniques, and several methods have been ...
Show more >The importance of clustering for creating groups of observations is well known. The emergence of high-dimensional data sets with a huge number of features leads to co-clustering techniques, and several methods have been developed for simultaneously producing groups of observations and features.By grouping the data set into blocks (the crossing of a row-cluster and a column-cluster), these techniques can sometimes better summarize the data set and its inherent structure. The Latent Block Model (LBM) is a well-known method for performing co-clustering. However, recently, contexts with features of different types (here called mixed type data sets) are becoming more common. The LBM is not directly applicable to this kind of data set. Here a natural extension of the usual LBM to the ``Multiple Latent Block Model" (MLBM) is proposed in order to handle mixed type data sets. Inference is performed using a Stochastic EM-algorithm that embeds a Gibbs sampler, and allows for missing data situations. A model selection criterion is defined to choose the number of row and column clusters. The method is then applied to both simulated and real data sets.Show less >
Show more >The importance of clustering for creating groups of observations is well known. The emergence of high-dimensional data sets with a huge number of features leads to co-clustering techniques, and several methods have been developed for simultaneously producing groups of observations and features.By grouping the data set into blocks (the crossing of a row-cluster and a column-cluster), these techniques can sometimes better summarize the data set and its inherent structure. The Latent Block Model (LBM) is a well-known method for performing co-clustering. However, recently, contexts with features of different types (here called mixed type data sets) are becoming more common. The LBM is not directly applicable to this kind of data set. Here a natural extension of the usual LBM to the ``Multiple Latent Block Model" (MLBM) is proposed in order to handle mixed type data sets. Inference is performed using a Stochastic EM-algorithm that embeds a Gibbs sampler, and allows for missing data situations. A model selection criterion is defined to choose the number of row and column clusters. The method is then applied to both simulated and real data sets.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- document
- Open access
- Access the document
- manuscript.pdf
- Open access
- Access the document
- manuscript.pdf
- Open access
- Access the document