Model-based co-clustering for mixed type data

Selosse, Margot; Jacques, Julien; Biernacki, Christophe

Type de document :

Article dans une revue scientifique

DOI :

10.1016/j.csda.2019.106866

URL permanente :

http://hdl.handle.net/20.500.12210/29382

Titre :

Model-based co-clustering for mixed type data

Auteur(s) :

Selosse, Margot [Auteur]
Jacques, Julien [Auteur]
Biernacki, Christophe [Auteur]

Titre de la revue :

Computational Statistics and Data Analysis

Numéro :

144

Pagination :

106866

Éditeur :

Elsevier

Date de publication :

2020

ISSN :

0167-9473

Mot(s)-clé(s) :

Latent block model Corresponding author
Co-clustering
Mixed-type data
Latent block model

Discipline(s) HAL :

Mathématiques [math]/Statistiques [math.ST]

Résumé en anglais : [en]

The importance of clustering for creating groups of observations is well known. The emergence of high-dimensional data sets with a huge number of features leads to co-clustering techniques, and several methods have been ...
Lire la suite >The importance of clustering for creating groups of observations is well known. The emergence of high-dimensional data sets with a huge number of features leads to co-clustering techniques, and several methods have been developed for simultaneously producing groups of observations and features. By grouping the data set into blocks (the crossing of a row-cluster and a column-cluster), these techniques can sometimes better summarize the data set and its inherent structure. The Latent Block Model (LBM) is a well-known method for performing co-clustering. However, recently, contexts with features of different types (here called mixed type data sets) are becoming more common. The LBM is not directly applicable to this kind of data set. Here a natural extension of the usual LBM to the ``Multiple Latent Block Model" (MLBM) is proposed in order to handle mixed type data sets. Inference is performed using a Stochastic EM-algorithm that embeds a Gibbs sampler, and allows for missing data situations. A model selection criterion is defined to choose the number of row and column clusters. The method is then applied to both simulated and real data sets.Lire moins >

Langue :

Anglais

Audience :

Internationale

Vulgarisation :

Non

Collections :

METRICS : Evaluation des technologies de santé et des pratiques médicales - ULR 2694

Date de dépôt :

2020-06-08T14:11:26Z
2020-06-09T09:22:43Z

Fichiers

documen
Accès libre
Accéder au document

Numéro de version	Lien	Date de modification
2	20.500.12210/29382*	2020-06-09T09:17:55Z
1	20.500.12210/29382.1	2020-06-08T14:11:26Z

Model-based co-clustering for mixed type data

BibTeX

CSV

Excel

RIS

Fichiers

Historique des modifications

Model-based co-clustering for mixed type data BibTeX CSV Excel RIS

Fichiers

Historique des modifications

Model-based co-clustering for mixed type data

BibTeX

CSV

Excel

RIS