• English
    • français
  • Help
  •  | 
  • Contact
  •  | 
  • About
  •  | 
  • Login
  • HAL portal
  •  | 
  • Pages Pro
  • EN
  •  / 
  • FR
View Item 
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Rotting bandits are not harder than ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Document type :
Communication dans un congrès avec actes
Title :
Rotting bandits are not harder than stochastic ones
Author(s) :
Seznec, Julien [Auteur]
Lelivrescolaire.fr
Sequential Learning [SEQUEL]
Locatelli, Andrea [Auteur]
Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg] [OVGU]
Carpentier, Alexandra [Auteur]
Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg] [OVGU]
Lazaric, Alessandro [Auteur]
Facebook AI Research [Paris] [FAIR]
Sequential Learning [SEQUEL]
Valko, Michal [Auteur] refId
DeepMind [Paris]
Sequential Learning [SEQUEL]
Conference title :
International Conference on Artificial Intelligence and Statistics
City :
Naha
Country :
Japon
Start date of the conference :
2019
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
In stochastic multi-armed bandits, the reward distribution of each arm isassumed to be stationary. This assumption is often violated in practice (e.g.,in recommendation systems), where the reward of an arm may change ...
Show more >
In stochastic multi-armed bandits, the reward distribution of each arm isassumed to be stationary. This assumption is often violated in practice (e.g.,in recommendation systems), where the reward of an arm may change whenever isselected, i.e., rested bandit setting. In this paper, we consider thenon-parametric rotting bandit setting, where rewards can only decrease. Weintroduce the filtering on expanding window average (FEWA) algorithm thatconstructs moving averages of increasing windows to identify arms that are morelikely to return high rewards when pulled once more. We prove that for anunknown horizon $T$, and without any knowledge on the decreasing behavior ofthe $K$ arms, FEWA achieves problem-dependent regret bound of$\widetilde{\mathcal{O}}(\log{(KT)}),$ and a problem-independent one of$\widetilde{\mathcal{O}}(\sqrt{KT})$. Our result substantially improves overthe algorithm of Levine et al. (2017), which suffers regret$\widetilde{\mathcal{O}}(K^{1/3}T^{2/3})$. FEWA also matches known bounds forthe stochastic bandit setting, thus showing that the rotting bandits are notharder. Finally, we report simulations confirming the theoretical improvementsof FEWA.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
ANR Project :
Inférence bayésienne à ressources limitées - données massives et modèles coûteux
Collections :
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Source :
Harvested from HAL
Files
Thumbnail
  • https://hal.inria.fr/hal-01936894v2/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-01936894v2/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-01936894v2/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-01936894v2/document
  • Open access
  • Access the document
Université de Lille

Mentions légales
Université de Lille © 2017