Rotting bandits are not harder than ...
Type de document :
Communication dans un congrès avec actes
Titre :
Rotting bandits are not harder than stochastic ones
Auteur(s) :
Seznec, Julien [Auteur]
Sequential Learning [SEQUEL]
Lelivrescolaire.fr
Locatelli, Andrea [Auteur]
Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg] [OVGU]
Carpentier, Alexandra [Auteur]
Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg] [OVGU]
Lazaric, Alessandro [Auteur]
Sequential Learning [SEQUEL]
Facebook AI Research [Paris] [FAIR]
Valko, Michal [Auteur]
Sequential Learning [SEQUEL]
DeepMind [Paris]
Sequential Learning [SEQUEL]
Lelivrescolaire.fr
Locatelli, Andrea [Auteur]
Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg] [OVGU]
Carpentier, Alexandra [Auteur]
Otto-von-Guericke-Universität Magdeburg = Otto-von-Guericke University [Magdeburg] [OVGU]
Lazaric, Alessandro [Auteur]
Sequential Learning [SEQUEL]
Facebook AI Research [Paris] [FAIR]
Valko, Michal [Auteur]
Sequential Learning [SEQUEL]
DeepMind [Paris]
Titre de la manifestation scientifique :
International Conference on Artificial Intelligence and Statistics
Ville :
Naha
Pays :
Japon
Date de début de la manifestation scientifique :
2019
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
In stochastic multi-armed bandits, the reward distribution of each arm isassumed to be stationary. This assumption is often violated in practice (e.g.,in recommendation systems), where the reward of an arm may change ...
Lire la suite >In stochastic multi-armed bandits, the reward distribution of each arm isassumed to be stationary. This assumption is often violated in practice (e.g.,in recommendation systems), where the reward of an arm may change whenever isselected, i.e., rested bandit setting. In this paper, we consider thenon-parametric rotting bandit setting, where rewards can only decrease. Weintroduce the filtering on expanding window average (FEWA) algorithm thatconstructs moving averages of increasing windows to identify arms that are morelikely to return high rewards when pulled once more. We prove that for anunknown horizon $T$, and without any knowledge on the decreasing behavior ofthe $K$ arms, FEWA achieves problem-dependent regret bound of$\widetilde{\mathcal{O}}(\log{(KT)}),$ and a problem-independent one of$\widetilde{\mathcal{O}}(\sqrt{KT})$. Our result substantially improves overthe algorithm of Levine et al. (2017), which suffers regret$\widetilde{\mathcal{O}}(K^{1/3}T^{2/3})$. FEWA also matches known bounds forthe stochastic bandit setting, thus showing that the rotting bandits are notharder. Finally, we report simulations confirming the theoretical improvementsof FEWA.Lire moins >
Lire la suite >In stochastic multi-armed bandits, the reward distribution of each arm isassumed to be stationary. This assumption is often violated in practice (e.g.,in recommendation systems), where the reward of an arm may change whenever isselected, i.e., rested bandit setting. In this paper, we consider thenon-parametric rotting bandit setting, where rewards can only decrease. Weintroduce the filtering on expanding window average (FEWA) algorithm thatconstructs moving averages of increasing windows to identify arms that are morelikely to return high rewards when pulled once more. We prove that for anunknown horizon $T$, and without any knowledge on the decreasing behavior ofthe $K$ arms, FEWA achieves problem-dependent regret bound of$\widetilde{\mathcal{O}}(\log{(KT)}),$ and a problem-independent one of$\widetilde{\mathcal{O}}(\sqrt{KT})$. Our result substantially improves overthe algorithm of Levine et al. (2017), which suffers regret$\widetilde{\mathcal{O}}(K^{1/3}T^{2/3})$. FEWA also matches known bounds forthe stochastic bandit setting, thus showing that the rotting bandits are notharder. Finally, we report simulations confirming the theoretical improvementsof FEWA.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-01936894v2/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01936894v2/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01936894v2/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01936894v2/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- seznec2019rotting.pdf
- Accès libre
- Accéder au document