MixtComp software: Model-based clustering\/imputation with mixed data, missing data and uncertain data
Logiciel MixtComp : Classification et imputation à base de modèles pour données mixtes, manquantes et incertaines
Type de document :
Autre communication scientifique (congrès sans actes - poster - séminaire...): Communication dans un congrès sans actes
URL permanente :
Titre :
MixtComp software: Model-based clustering\/imputation with mixed data, missing data and uncertain data
Logiciel MixtComp : Classification et imputation à base de modèles pour données mixtes, manquantes et incertaines
Logiciel MixtComp : Classification et imputation à base de modèles pour données mixtes, manquantes et incertaines
Auteur(s) :
Titre de la manifestation scientifique :
MISSDATA 2015
Ville :
Rennes
Pays :
France
Date de début de la manifestation scientifique :
2015-06-17
Date de publication :
2015-06-17
Discipline(s) HAL :
Statistiques [stat]/Méthodologie [stat.ME]
Résumé en anglais : [en]
The "Big Data" paradigm involves large and complex data sets. Complexity includes both variety (mixed data: continuous and\/or categorical and\/or ordinal and\/or functional...) and missing, or partially missing (binned), ...
Lire la suite >The "Big Data" paradigm involves large and complex data sets. Complexity includes both variety (mixed data: continuous and\/or categorical and\/or ordinal and\/or functional...) and missing, or partially missing (binned), items. Clustering is a suitable response for volume but it needs also to deal with complexity, especially as volume promotes complexity emergence. Model-based clustering has demonstrated many theoretical and practical successes (McLachlan 2000), including multivariate mixed data with conditional (Biernacki 2013) or without conditional independence (Marbac et al. 2014). In addition, this full generative design allows to straightforwardly handle missing or binned data (McLachlan 2000; Biernacki 2007). Model estimation can also be performed by simple EM-like algorithms, as the SEM one (Celeux and Diebolt 1985). MixComp is a new R software, written in C++, implementing model-based clustering for multivariate missing\/binned\/mixed data under the conditional independence assumption (Goodman 1974). Current implemented mixed data are continuous (Gaussian), categorical (multinomial) and integer (Poisson) ones. However, architecture of MixComp is designed for incremental insertion of new kinds of data (ordinal, ranks, functional...) and related models. Currently, MixComp is not freely available as an R package but will be soon freely available through a specific web interface. Beyond its clustering task, it allows also to perform imputation of missing\/binned data (with associated confidence intervals) by using the mixture model ability for density estimation as well. Topics will include: mixture models - conditional independence - SEM algorithm - model selection criteria Prerequisites: elementary knowledge of general statistical concepts, of mixture models, of EM algorithm and of standard model selection criteria is assumed. Moreover, basic programming in R is useful.Lire moins >
Lire la suite >The "Big Data" paradigm involves large and complex data sets. Complexity includes both variety (mixed data: continuous and\/or categorical and\/or ordinal and\/or functional...) and missing, or partially missing (binned), items. Clustering is a suitable response for volume but it needs also to deal with complexity, especially as volume promotes complexity emergence. Model-based clustering has demonstrated many theoretical and practical successes (McLachlan 2000), including multivariate mixed data with conditional (Biernacki 2013) or without conditional independence (Marbac et al. 2014). In addition, this full generative design allows to straightforwardly handle missing or binned data (McLachlan 2000; Biernacki 2007). Model estimation can also be performed by simple EM-like algorithms, as the SEM one (Celeux and Diebolt 1985). MixComp is a new R software, written in C++, implementing model-based clustering for multivariate missing\/binned\/mixed data under the conditional independence assumption (Goodman 1974). Current implemented mixed data are continuous (Gaussian), categorical (multinomial) and integer (Poisson) ones. However, architecture of MixComp is designed for incremental insertion of new kinds of data (ordinal, ranks, functional...) and related models. Currently, MixComp is not freely available as an R package but will be soon freely available through a specific web interface. Beyond its clustering task, it allows also to perform imputation of missing\/binned data (with associated confidence intervals) by using the mixture model ability for density estimation as well. Topics will include: mixture models - conditional independence - SEM algorithm - model selection criteria Prerequisites: elementary knowledge of general statistical concepts, of mixture models, of EM algorithm and of standard model selection criteria is assumed. Moreover, basic programming in R is useful.Lire moins >
Langue :
Anglais
Audience :
Internationale
Vulgarisation :
Non
Date de dépôt :
2020-06-08T14:10:14Z