Fast Asymptotically Optimal Algorithms for ...
Document type :
Communication dans un congrès avec actes
Title :
Fast Asymptotically Optimal Algorithms for Non-Parametric Stochastic Bandits
Author(s) :
Baudry, Dorian [Auteur]
IA coopérative : équité, vie privée, incitations [FAIRPLAY]
Pesquerel, Fabien [Auteur]
Scool [Scool]
Degenne, Rémy [Auteur]
Scool [Scool]
Maillard, Odalric Ambrym [Auteur]
Scool [Scool]
IA coopérative : équité, vie privée, incitations [FAIRPLAY]
Pesquerel, Fabien [Auteur]
Scool [Scool]
Degenne, Rémy [Auteur]
Scool [Scool]
Maillard, Odalric Ambrym [Auteur]

Scool [Scool]
Conference title :
NeurIPS 2023 - Thirty-seventh Conference on Neural Information Processing Systems
City :
New Orleans (Louisiana)
Country :
Etats-Unis d'Amérique
Start date of the conference :
2023-12-10
Book title :
Advances in Neural Information Processing Systems 36 (NeurIPS 2023)
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
We consider the problem of regret minimization in non-parametric stochastic bandits. When the rewards are known to be bounded from above, there exists asymptotically optimal algorithms, with asymptotic regret depending on ...
Show more >We consider the problem of regret minimization in non-parametric stochastic bandits. When the rewards are known to be bounded from above, there exists asymptotically optimal algorithms, with asymptotic regret depending on an infimum of Kullback-Leibler divergences (KL). These algorithms are computationally expensive and require storing all past rewards, thus simpler but non-optimal algorithms are often used instead. We introduce several methods to approximate the infimum KL which reduce drastically the computational and memory costs of existing optimal algorithms, while keeping their regret guaranties. We apply our findings to design new variants of the MED and IMED algorithms, and demonstrate their interest with extensive numerical simulations.Show less >
Show more >We consider the problem of regret minimization in non-parametric stochastic bandits. When the rewards are known to be bounded from above, there exists asymptotically optimal algorithms, with asymptotic regret depending on an infimum of Kullback-Leibler divergences (KL). These algorithms are computationally expensive and require storing all past rewards, thus simpler but non-optimal algorithms are often used instead. We introduce several methods to approximate the infimum KL which reduce drastically the computational and memory costs of existing optimal algorithms, while keeping their regret guaranties. We apply our findings to design new variants of the MED and IMED algorithms, and demonstrate their interest with extensive numerical simulations.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
ANR Project :
Collections :
Source :
Files
- document
- Open access
- Access the document
- 5313_fast_asymptotically_optimal_al.pdf
- Open access
- Access the document
- document
- Open access
- Access the document
- 5313_fast_asymptotically_optimal_al.pdf
- Open access
- Access the document