An efficient SEM algorithm for Gaussian ...
Type de document :
Autre communication scientifique (congrès sans actes - poster - séminaire...): Communication dans un congrès sans actes
URL permanente :
Titre :
An efficient SEM algorithm for Gaussian Mixtures with missing data
Auteur(s) :
Vandewalle, Vincent [Auteur]
METRICS : Evaluation des technologies de santé et des pratiques médicales - ULR 2694
Evaluation des technologies de santé et des pratiques médicales - ULR 2694 [METRICS]
Biernacki, Christophe [Auteur]
METRICS : Evaluation des technologies de santé et des pratiques médicales - ULR 2694
Evaluation des technologies de santé et des pratiques médicales - ULR 2694 [METRICS]
Biernacki, Christophe [Auteur]
Titre de la manifestation scientifique :
8th International Conference of the ERCIM WG on Computational and Methodological Statistics
Ville :
Londres
Pays :
Royaume-Uni
Date de début de la manifestation scientifique :
2015-12-12
Discipline(s) HAL :
Mathématiques [math]
Résumé en anglais : [en]
The missing data problem is well-known for statisticians but its frequency increases with the growing size of modern datasets. In Gaussian model-based clustering, the EM algorithm easily takes into account such data by ...
Lire la suite >The missing data problem is well-known for statisticians but its frequency increases with the growing size of modern datasets. In Gaussian model-based clustering, the EM algorithm easily takes into account such data by dealing with two kinds of latent levels: the components and the variables. However, the quite familiar degeneracy problem in Gaussian mixtures is aggravated during the EM runs. Indeed, numerical experiments clearly reveal that degeneracy is quite slow and also more frequent than with complete data. In practice, such situations are difficult to detect efficiently. Consequently, degenerated solutions may be confused with valuable solutions and, in addition, computing time may be wasted through wrong runs. A theoretical and practical study of the degeneracy will be presented. Moreover a simple condition on the latent partition to avoid degeneracy will be exhibited. This condition is used in a constrained version of the Stochastic EM (SEM) algorithm. Numerical experiments on real and simulated data illustrate the good behaviour of the proposed algorithm.Lire moins >
Lire la suite >The missing data problem is well-known for statisticians but its frequency increases with the growing size of modern datasets. In Gaussian model-based clustering, the EM algorithm easily takes into account such data by dealing with two kinds of latent levels: the components and the variables. However, the quite familiar degeneracy problem in Gaussian mixtures is aggravated during the EM runs. Indeed, numerical experiments clearly reveal that degeneracy is quite slow and also more frequent than with complete data. In practice, such situations are difficult to detect efficiently. Consequently, degenerated solutions may be confused with valuable solutions and, in addition, computing time may be wasted through wrong runs. A theoretical and practical study of the degeneracy will be presented. Moreover a simple condition on the latent partition to avoid degeneracy will be exhibited. This condition is used in a constrained version of the Stochastic EM (SEM) algorithm. Numerical experiments on real and simulated data illustrate the good behaviour of the proposed algorithm.Lire moins >
Langue :
Anglais
Audience :
Internationale
Vulgarisation :
Non
Établissement(s) :
CHU Lille
Université de Lille
Université de Lille
Date de dépôt :
2020-06-08T14:11:05Z
2021-05-28T09:13:52Z
2021-05-28T09:13:52Z