Textual data summarization using the ...
Document type :
Compte-rendu et recension critique d'ouvrage
Title :
Textual data summarization using the Self-Organized Co-Clustering model
Author(s) :
Selosse, Margot [Auteur]
Entrepôts, Représentation et Ingénierie des Connaissances [ERIC]
Jacques, Julien [Auteur]
Entrepôts, Représentation et Ingénierie des Connaissances [ERIC]
Biernacki, Christophe [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Entrepôts, Représentation et Ingénierie des Connaissances [ERIC]
Jacques, Julien [Auteur]
Entrepôts, Représentation et Ingénierie des Connaissances [ERIC]
Biernacki, Christophe [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Journal title :
Pattern Recognition
Pages :
107315
Publisher :
Elsevier
Publication date :
2020-02
ISSN :
0031-3203
English keyword(s) :
coclustering
Latent Block Model
document-term matrix
Latent Block Model
document-term matrix
HAL domain(s) :
Mathématiques [math]/Statistiques [math.ST]
English abstract : [en]
Recently, different studies have demonstrated the use of co-clustering, a data mining technique which simultaneously produces row-clusters of observations and column-clusters of features. The present work introduces a novel ...
Show more >Recently, different studies have demonstrated the use of co-clustering, a data mining technique which simultaneously produces row-clusters of observations and column-clusters of features. The present work introduces a novel co-clustering model to easily summarize textual data in a document-term format. In addition to highlighting homogeneous co-clusters as other existing algorithms do we also distinguish noisy co-clusters from significant co-clusters, which is particularly useful for sparse document-term matrices. Furthermore, our model proposes a structure among the significant co-clusters, thus providing improved interpretability to users. The approach proposed contends with state-of-the-art methods for document and term clustering and offers user-friendly results. The model relies on the Poisson distribution and on a constrained version of the Latent Block Model, which is a probabilistic approach for co-clustering. A Stochastic Expectation-Maximization algorithm is proposed to run the model’s inference as well as a model selection criterion to choose the number of coclusters. Both simulated and real data sets illustrate the eciency of this model by its ability to easily identify relevant co-clusters.Show less >
Show more >Recently, different studies have demonstrated the use of co-clustering, a data mining technique which simultaneously produces row-clusters of observations and column-clusters of features. The present work introduces a novel co-clustering model to easily summarize textual data in a document-term format. In addition to highlighting homogeneous co-clusters as other existing algorithms do we also distinguish noisy co-clusters from significant co-clusters, which is particularly useful for sparse document-term matrices. Furthermore, our model proposes a structure among the significant co-clusters, thus providing improved interpretability to users. The approach proposed contends with state-of-the-art methods for document and term clustering and offers user-friendly results. The model relies on the Poisson distribution and on a constrained version of the Latent Block Model, which is a probabilistic approach for co-clustering. A Stochastic Expectation-Maximization algorithm is proposed to run the model’s inference as well as a model selection criterion to choose the number of coclusters. Both simulated and real data sets illustrate the eciency of this model by its ability to easily identify relevant co-clusters.Show less >
Language :
Anglais
Popular science :
Non
Collections :
Source :
Files
- document
- Open access
- Access the document
- manuscript.pdf
- Open access
- Access the document
- manuscript.pdf
- Open access
- Access the document