Rotting bandits are not harder than ...
Document type :
Communication dans un congrès avec actes
Title :
Rotting bandits are not harder than stochastic ones
Author(s) :
Seznec, Julien [Auteur]
Sequential Learning [SEQUEL]
Lelivrescolaire.fr
Locatelli, Andrea [Auteur]
Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg] [OVGU]
Carpentier, Alexandra [Auteur]
Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg] [OVGU]
Lazaric, Alessandro [Auteur]
Sequential Learning [SEQUEL]
Facebook AI Research [Paris] [FAIR]
Valko, Michal [Auteur]
Sequential Learning [SEQUEL]
DeepMind [Paris]
Sequential Learning [SEQUEL]
Lelivrescolaire.fr
Locatelli, Andrea [Auteur]
Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg] [OVGU]
Carpentier, Alexandra [Auteur]
Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg] [OVGU]
Lazaric, Alessandro [Auteur]
Sequential Learning [SEQUEL]
Facebook AI Research [Paris] [FAIR]
Valko, Michal [Auteur]
Sequential Learning [SEQUEL]
DeepMind [Paris]
Conference title :
International Conference on Artificial Intelligence and Statistics
City :
Naha
Country :
Japon
Start date of the conference :
2019
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
In stochastic multi-armed bandits, the reward distribution of each arm isassumed to be stationary. This assumption is often violated in practice (e.g.,in recommendation systems), where the reward of an arm may change ...
Show more >In stochastic multi-armed bandits, the reward distribution of each arm isassumed to be stationary. This assumption is often violated in practice (e.g.,in recommendation systems), where the reward of an arm may change whenever isselected, i.e., rested bandit setting. In this paper, we consider thenon-parametric rotting bandit setting, where rewards can only decrease. Weintroduce the filtering on expanding window average (FEWA) algorithm thatconstructs moving averages of increasing windows to identify arms that are morelikely to return high rewards when pulled once more. We prove that for anunknown horizon $T$, and without any knowledge on the decreasing behavior ofthe $K$ arms, FEWA achieves problem-dependent regret bound of$\widetilde{\mathcal{O}}(\log{(KT)}),$ and a problem-independent one of$\widetilde{\mathcal{O}}(\sqrt{KT})$. Our result substantially improves overthe algorithm of Levine et al. (2017), which suffers regret$\widetilde{\mathcal{O}}(K^{1/3}T^{2/3})$. FEWA also matches known bounds forthe stochastic bandit setting, thus showing that the rotting bandits are notharder. Finally, we report simulations confirming the theoretical improvementsof FEWA.Show less >
Show more >In stochastic multi-armed bandits, the reward distribution of each arm isassumed to be stationary. This assumption is often violated in practice (e.g.,in recommendation systems), where the reward of an arm may change whenever isselected, i.e., rested bandit setting. In this paper, we consider thenon-parametric rotting bandit setting, where rewards can only decrease. Weintroduce the filtering on expanding window average (FEWA) algorithm thatconstructs moving averages of increasing windows to identify arms that are morelikely to return high rewards when pulled once more. We prove that for anunknown horizon $T$, and without any knowledge on the decreasing behavior ofthe $K$ arms, FEWA achieves problem-dependent regret bound of$\widetilde{\mathcal{O}}(\log{(KT)}),$ and a problem-independent one of$\widetilde{\mathcal{O}}(\sqrt{KT})$. Our result substantially improves overthe algorithm of Levine et al. (2017), which suffers regret$\widetilde{\mathcal{O}}(K^{1/3}T^{2/3})$. FEWA also matches known bounds forthe stochastic bandit setting, thus showing that the rotting bandits are notharder. Finally, we report simulations confirming the theoretical improvementsof FEWA.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.inria.fr/hal-01936894v2/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01936894v2/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01936894v2/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01936894v2/document
- Open access
- Access the document
- document
- Open access
- Access the document
- seznec2019rotting.pdf
- Open access
- Access the document