Rotting bandits are not harder than stochastic ones

Seznec, Julien; Locatelli, Andrea; Carpentier, Alexandra; Lazaric, Alessandro; Valko, Michal

Type de document :

Communication dans un congrès avec actes

Titre :

Rotting bandits are not harder than stochastic ones

Auteur(s) :

Seznec, Julien [Auteur]
Sequential Learning [SEQUEL]
Lelivrescolaire.fr
Locatelli, Andrea [Auteur]
Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg] [OVGU]
Carpentier, Alexandra [Auteur]
Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg] [OVGU]
Lazaric, Alessandro [Auteur]

Sequential Learning [SEQUEL]
Facebook AI Research [Paris] [FAIR]
Valko, Michal [Auteur]

Sequential Learning [SEQUEL]
DeepMind [Paris]

Titre de la manifestation scientifique :

International Conference on Artificial Intelligence and Statistics

Ville :

Naha

Pays :

Japon

Date de début de la manifestation scientifique :

2019

Discipline(s) HAL :

Statistiques [stat]/Machine Learning [stat.ML]

Résumé en anglais : [en]

In stochastic multi-armed bandits, the reward distribution of each arm isassumed to be stationary. This assumption is often violated in practice (e.g.,in recommendation systems), where the reward of an arm may change ...
Lire la suite >In stochastic multi-armed bandits, the reward distribution of each arm isassumed to be stationary. This assumption is often violated in practice (e.g.,in recommendation systems), where the reward of an arm may change whenever isselected, i.e., rested bandit setting. In this paper, we consider thenon-parametric rotting bandit setting, where rewards can only decrease. Weintroduce the filtering on expanding window average (FEWA) algorithm thatconstructs moving averages of increasing windows to identify arms that are morelikely to return high rewards when pulled once more. We prove that for anunknown horizon $T$, and without any knowledge on the decreasing behavior ofthe $K$ arms, FEWA achieves problem-dependent regret bound of$\widetilde{\mathcal{O}}(\log{(KT)}),$ and a problem-independent one of$\widetilde{\mathcal{O}}(\sqrt{KT})$. Our result substantially improves overthe algorithm of Levine et al. (2017), which suffers regret$\widetilde{\mathcal{O}}(K^{1/3}T^{2/3})$. FEWA also matches known bounds forthe stochastic bandit setting, thus showing that the rotting bandits are notharder. Finally, we report simulations confirming the theoretical improvementsof FEWA.Lire moins >

Langue :

Anglais

Comité de lecture :

Oui

Audience :

Internationale

Vulgarisation :

Non

Projet ANR :

Inférence bayésienne à ressources limitées - données massives et modèles coûteux

Collections :