Dealing with missing data in model-based clustering through a MNAR model

Biernacki, Christophe; Celeux, Gilles; Josse, Julie; Laporte, Fabien

Type de document :

Autre communication scientifique (congrès sans actes - poster - séminaire...): Communication dans un congrès sans actes

URL permanente :

http://hdl.handle.net/20.500.12210/29365

Titre :

Dealing with missing data in model-based clustering through a MNAR model

Auteur(s) :

Biernacki, Christophe [Auteur]

Celeux, Gilles [Auteur]
Josse, Julie [Auteur]
Laporte, Fabien [Auteur]

Titre de la manifestation scientifique :

CRoNos & MDA 2019 - Meeting and Workshop on Multivariate Data Analysis and Software

Ville :

Limassol

Pays :

Chypre

Date de début de la manifestation scientifique :

2019-04-14

Discipline(s) HAL :

Statistiques [stat]/Méthodologie [stat.ME]

Résumé en anglais : [en]

Since the 90s, model-based clustering is largely used to classify data. Nowadays, with the increase of available data, missing values are more frequent. Traditional ways to deal with them consist to obtain a filled data ...
Lire la suite >Since the 90s, model-based clustering is largely used to classify data. Nowadays, with the increase of available data, missing values are more frequent. Traditional ways to deal with them consist to obtain a filled data set, either by discarding missing values or by imputing them. In the first case some information is lost; in the second case the final clustering purpose is not taken into account through the imputation step. Thus both solutions risk to blur the clustering estimation result. Alternatively, we defend the need to embed the missingness mechanism directly within the clustering modeling step. There exists three types of missing data: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). In all situations, logistic regression is proposed as a natural and flexible candidate model. In particular, its flexibility property allows to design some meaningful parsimonious variants, as dependency on missing values or dependency on the cluster label. In this unified context, standard model selection criteria can be used to select between such different missing data mechanisms, simultaneously with the number of clusters. Practical interest of our proposal is illustrated on data derived from medical studies suffering from many missing data.Lire moins >

Langue :

Anglais

Audience :

Internationale

Vulgarisation :

Non

Collections :

METRICS : Evaluation des technologies de santé et des pratiques médicales - ULR 2694

Date de dépôt :

2020-06-08T14:11:20Z
2020-06-09T09:18:27Z

Fichiers

documen
Accès libre
Accéder au document

Numéro de version	Lien	Date de modification
2	20.500.12210/29365*	2020-06-09T09:18:12Z
1	20.500.12210/29365.1	2020-06-08T14:11:20Z

Dealing with missing data in model-based ...

BibTeX

CSV

Excel

RIS

Fichiers

Historique des modifications

Dealing with missing data in model-based ... BibTeX CSV Excel RIS

Fichiers

Historique des modifications

Dealing with missing data in model-based ...

BibTeX

CSV

Excel

RIS