Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

Besson, Lilian; Kaufmann, Emilie; Maillard, Odalric Ambrym; Seznec, Julien

Type de document :

Compte-rendu et recension critique d'ouvrage

Titre :

Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

Auteur(s) :

Besson, Lilian [Auteur]
École normale supérieure - Rennes [ENS Rennes]
Kaufmann, Emilie [Auteur]

Scool [Scool]
Maillard, Odalric Ambrym [Auteur]

Scool [Scool]
Seznec, Julien [Auteur]
Scool [Scool]
Lelivrescolaire.fr

Titre de la revue :

Journal of Machine Learning Research

Éditeur :

Microtome Publishing

Date de publication :

2022-03

ISSN :

1532-4435

Mot(s)-clé(s) en anglais :

Multi-Armed Bandits
Non-Stationary Bandits
Change Point Detection

Discipline(s) HAL :

Statistiques [stat]/Autres [stat.ML]

Résumé en anglais : [en]

We introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards. This algorithm combines an efficient bandit algorithm, kl-UCB, with an efficient, parameter-free, changepoint ...
Lire la suite >We introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards. This algorithm combines an efficient bandit algorithm, kl-UCB, with an efficient, parameter-free, changepoint detector, the Bernoulli Generalized Likelihood Ratio Test, for which we provide new theoretical guarantees of independent interest. Unlike previous non-stationary bandit algorithms using a change-point detector, GLR-klUCB does not need to be calibrated based on prior knowledge on the arms' means. We prove that this algorithm can attain a $O(\sqrt{TA \Upsilon_T\log(T)})$ regret in $T$ rounds on some ``easy'' instances, where A is the number of arms and $\Upsilon_T$ the number of change-points, without prior knowledge of $\Upsilon_T$. In contrast with recently proposed algorithms that are agnostic to $\Upsilon_T$, we perform a numerical study showing that GLR-klUCB is also very efficient in practice, beyond easy instances.Lire moins >

Langue :

Anglais

Vulgarisation :

Non

Projet ANR :

BANDITS MANCHOTS POUR SIGNAUX NON-STATIONNAIRES ET STRUCTURES
Au delà de l'apprentissage séquentiel pour de meilleures prises de décisions

Collections :