Online $k$-means Clustering
Document type :
Communication dans un congrès avec actes
Title :
Online $k$-means Clustering
Author(s) :
Cohen-Addad, Vincent [Auteur]
Recherche Opérationnelle [RO]
Guedj, Benjamin [Auteur]
University College of London [London] [UCL]
Department of Computer science [University College of London] [UCL-CS]
Inria-CWI [Inria-CWI]
MOdel for Data Analysis and Learning [MODAL]
The Inria London Programme [Inria-London]
Kanade, Varun [Auteur]
University of Oxford
Rom, Guy [Auteur]
University of Oxford
Recherche Opérationnelle [RO]
Guedj, Benjamin [Auteur]
University College of London [London] [UCL]
Department of Computer science [University College of London] [UCL-CS]
Inria-CWI [Inria-CWI]
MOdel for Data Analysis and Learning [MODAL]
The Inria London Programme [Inria-London]
Kanade, Varun [Auteur]
University of Oxford
Rom, Guy [Auteur]
University of Oxford
Conference title :
AISTATS 2021 - The 24th International Conference on Artificial Intelligence and Statistics
City :
Virtual
Country :
France
Start date of the conference :
2021-04-13
Publication date :
2021
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
Informatique [cs]/Apprentissage [cs.LG]
Statistiques [stat]/Théorie [stat.TH]
Informatique [cs]/Apprentissage [cs.LG]
Statistiques [stat]/Théorie [stat.TH]
English abstract : [en]
We study the problem of online clustering where a clustering algorithm has to assign a new point that arrives to one of $k$ clusters. The specific formulation we use is the $k$-means objective: At each time step the algorithm ...
Show more >We study the problem of online clustering where a clustering algorithm has to assign a new point that arrives to one of $k$ clusters. The specific formulation we use is the $k$-means objective: At each time step the algorithm has to maintain a set of k candidate centers and the loss incurred is the squared distance between the new point and the closest center. The goal is to minimize regret with respect to the best solution to the $k$-means objective ($\mathcal{C}$) in hindsight. We show that provided the data lies in a bounded region, an implementation of the Multiplicative Weights Update Algorithm (MWUA) using a discretized grid achieves a regret bound of $\tilde{O}(\sqrt{T})$ in expectation. We also present an online-to-offline reduction that shows that an efficient no-regret online algorithm (despite being allowed to choose a different set of candidate centres at each round) implies an offline efficient algorithm for the $k$-means problem. In light of this hardness, we consider the slightly weaker requirement of comparing regret with respect to $(1 + \epsilon) \mathcal{C}$ and present a no-regret algorithm with runtime $O\left(T(\mathrm{poly}(log(T),k,d,1/\epsilon)^{k(d+O(1))}\right)$. Our algorithm is based on maintaining an incremental coreset and an adaptive variant of the MWUA. We show that na\"{i}ve online algorithms, such as \emph{Follow The Leader}, fail to produce sublinear regret in the worst case. We also report preliminary experiments with synthetic and real-world data.Show less >
Show more >We study the problem of online clustering where a clustering algorithm has to assign a new point that arrives to one of $k$ clusters. The specific formulation we use is the $k$-means objective: At each time step the algorithm has to maintain a set of k candidate centers and the loss incurred is the squared distance between the new point and the closest center. The goal is to minimize regret with respect to the best solution to the $k$-means objective ($\mathcal{C}$) in hindsight. We show that provided the data lies in a bounded region, an implementation of the Multiplicative Weights Update Algorithm (MWUA) using a discretized grid achieves a regret bound of $\tilde{O}(\sqrt{T})$ in expectation. We also present an online-to-offline reduction that shows that an efficient no-regret online algorithm (despite being allowed to choose a different set of candidate centres at each round) implies an offline efficient algorithm for the $k$-means problem. In light of this hardness, we consider the slightly weaker requirement of comparing regret with respect to $(1 + \epsilon) \mathcal{C}$ and present a no-regret algorithm with runtime $O\left(T(\mathrm{poly}(log(T),k,d,1/\epsilon)^{k(d+O(1))}\right)$. Our algorithm is based on maintaining an incremental coreset and an adaptive variant of the MWUA. We show that na\"{i}ve online algorithms, such as \emph{Follow The Leader}, fail to produce sublinear regret in the worst case. We also report preliminary experiments with synthetic and real-world data.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- document
- Open access
- Access the document
- 1909.06861.pdf
- Open access
- Access the document
- 1909.06861
- Open access
- Access the document
- document
- Open access
- Access the document
- 1909.06861.pdf
- Open access
- Access the document