Logiciel MixtComp : Classification et ...
Document type :
Autre communication scientifique (congrès sans actes - poster - séminaire...): Communication dans un congrès avec actes: Conférence invitée
Title :
Logiciel MixtComp : Classification et imputation à base de modèles pour données mixtes, manquantes et incertaines
Author(s) :
Biernacki, Christophe [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Laboratoire Paul Painlevé - UMR 8524 [LPP]
MOdel for Data Analysis and Learning [MODAL]
Laboratoire Paul Painlevé - UMR 8524 [LPP]
Conference title :
MISSDATA 2015
City :
Rennes
Country :
France
Start date of the conference :
2015-06-17
HAL domain(s) :
Statistiques [stat]/Méthodologie [stat.ME]
English abstract : [en]
The "Big Data" paradigm involves large and complex data sets. Complexity includes both variety (mixed data: continuous and/or categorical and/or ordinal and/or functional...) and missing, or partially missing (binned), ...
Show more >The "Big Data" paradigm involves large and complex data sets. Complexity includes both variety (mixed data: continuous and/or categorical and/or ordinal and/or functional...) and missing, or partially missing (binned), items. Clustering is a suitable response for volume but it needs also to deal with complexity, especially as volume promotes complexity emergence.Model-based clustering has demonstrated many theoretical and practical successes (McLachlan 2000), including multivariate mixed data with conditional (Biernacki 2013) or without conditional independence (Marbac et al. 2014). In addition, this full generative design allows to straightforwardly handle missing or binned data (McLachlan 2000; Biernacki 2007). Model estimation can also be performed by simple EM-like algorithms, as the SEM one (Celeux and Diebolt 1985).MixComp is a new R software, written in C++, implementing model-based clustering for multivariate missing/binned/mixed data under the conditional independence assumption (Goodman 1974). Current implemented mixed data are continuous (Gaussian), categorical (multinomial) and integer (Poisson) ones. However, architecture of MixComp is designed for incremental insertion of new kinds of data (ordinal, ranks, functional...) and related models.Currently, MixComp is not freely available as an R package but will be soon freely available through a specific web interface. Beyond its clustering task, it allows also to perform imputation of missing/binned data (with associated confidence intervals) by using the mixture model ability for density estimation as well.Topics will include: mixture models - conditional independence - SEM algorithm - model selection criteriaPrerequisites: elementary knowledge of general statistical concepts, of mixture models, of EM algorithm and of standard model selection criteria is assumed. Moreover, basic programming in R is useful.Show less >
Show more >The "Big Data" paradigm involves large and complex data sets. Complexity includes both variety (mixed data: continuous and/or categorical and/or ordinal and/or functional...) and missing, or partially missing (binned), items. Clustering is a suitable response for volume but it needs also to deal with complexity, especially as volume promotes complexity emergence.Model-based clustering has demonstrated many theoretical and practical successes (McLachlan 2000), including multivariate mixed data with conditional (Biernacki 2013) or without conditional independence (Marbac et al. 2014). In addition, this full generative design allows to straightforwardly handle missing or binned data (McLachlan 2000; Biernacki 2007). Model estimation can also be performed by simple EM-like algorithms, as the SEM one (Celeux and Diebolt 1985).MixComp is a new R software, written in C++, implementing model-based clustering for multivariate missing/binned/mixed data under the conditional independence assumption (Goodman 1974). Current implemented mixed data are continuous (Gaussian), categorical (multinomial) and integer (Poisson) ones. However, architecture of MixComp is designed for incremental insertion of new kinds of data (ordinal, ranks, functional...) and related models.Currently, MixComp is not freely available as an R package but will be soon freely available through a specific web interface. Beyond its clustering task, it allows also to perform imputation of missing/binned data (with associated confidence intervals) by using the mixture model ability for density estimation as well.Topics will include: mixture models - conditional independence - SEM algorithm - model selection criteriaPrerequisites: elementary knowledge of general statistical concepts, of mixture models, of EM algorithm and of standard model selection criteria is assumed. Moreover, basic programming in R is useful.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- slides.pdf
- Open access
- Access the document