Minimax Optimal Clustering of Bipartite ...
Type de document :
Article dans une revue scientifique: Article original
DOI :
Titre :
Minimax Optimal Clustering of Bipartite Graphs with a Generalized Power Method
Auteur(s) :
Braun, Guillaume [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Tyagi, Hemant [Auteur]
MOdel for Data Analysis and Learning [MODAL]
MOdel for Data Analysis and Learning [MODAL]
Tyagi, Hemant [Auteur]
MOdel for Data Analysis and Learning [MODAL]
Titre de la revue :
Information and Inference
Pagination :
1830-1866
Éditeur :
Oxford University Press (OUP)
Date de publication :
2023-09-27
ISSN :
2049-8764
Discipline(s) HAL :
Mathématiques [math]/Statistiques [math.ST]
Résumé en anglais : [en]
Clustering bipartite graphs is a fundamental task in network analysis. In the high-dimensional regime where the number of rows $n_1$ and the number of columns $n_2$ of the associated adjacency matrix are of different order, ...
Lire la suite >Clustering bipartite graphs is a fundamental task in network analysis. In the high-dimensional regime where the number of rows $n_1$ and the number of columns $n_2$ of the associated adjacency matrix are of different order, existing methods derived from the ones used for symmetric graphs can come with sub-optimal guarantees. Due to increasing number of applications for bipartite graphs in the high dimensional regime, it is of fundamental importance to design optimal algorithms for this setting. The recent work of Ndaoud et al. (2022) improves the existing upper-bound for the misclustering rate in the special case where the columns (resp. rows) can be partitioned into $L = 2$ (resp. $K = 2$) communities. Unfortunately, their algorithm cannot be extended to the more general setting where $K \neq L \geq 2$. We overcome this limitation by introducing a new algorithm based on the power method. We derive conditions for exact recovery in the general setting where $K \neq L \geq 2$, and show that it recovers the result in Ndaoud et al. (2022). We also derive a minimax lower bound on the misclustering error when $K=L$ under a symmetric version of our model, which matches the corresponding upper bound up to a factor depending on $K$.Lire moins >
Lire la suite >Clustering bipartite graphs is a fundamental task in network analysis. In the high-dimensional regime where the number of rows $n_1$ and the number of columns $n_2$ of the associated adjacency matrix are of different order, existing methods derived from the ones used for symmetric graphs can come with sub-optimal guarantees. Due to increasing number of applications for bipartite graphs in the high dimensional regime, it is of fundamental importance to design optimal algorithms for this setting. The recent work of Ndaoud et al. (2022) improves the existing upper-bound for the misclustering rate in the special case where the columns (resp. rows) can be partitioned into $L = 2$ (resp. $K = 2$) communities. Unfortunately, their algorithm cannot be extended to the more general setting where $K \neq L \geq 2$. We overcome this limitation by introducing a new algorithm based on the power method. We derive conditions for exact recovery in the general setting where $K \neq L \geq 2$, and show that it recovers the result in Ndaoud et al. (2022). We also derive a minimax lower bound on the misclustering error when $K=L$ under a symmetric version of our model, which matches the corresponding upper bound up to a factor depending on $K$.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- 2205.12104
- Accès libre
- Accéder au document