Kullback-Leibler Upper Confidence Bounds ...
Document type :
Compte-rendu et recension critique d'ouvrage
DOI :
Title :
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation
Author(s) :
Cappé, Olivier [Auteur]
Laboratoire Traitement et Communication de l'Information [LTCI]
Garivier, Aurélien [Auteur]
Institut de Mathématiques de Toulouse UMR5219 [IMT]
Maillard, Odalric Ambrym [Auteur]
Montanuniversität Leoben [MUL]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
Stoltz, Gilles [Auteur]
Computational Learning, Aggregation, Supervised Statistical, Inference, and Classification [CLASSIC]
Groupement de Recherche et d'Etudes en Gestion à HEC [GREGH]
Laboratoire Traitement et Communication de l'Information [LTCI]
Garivier, Aurélien [Auteur]
Institut de Mathématiques de Toulouse UMR5219 [IMT]
Maillard, Odalric Ambrym [Auteur]
![refId](/themes/Mirage2//images/idref.png)
Montanuniversität Leoben [MUL]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
Stoltz, Gilles [Auteur]
Computational Learning, Aggregation, Supervised Statistical, Inference, and Classification [CLASSIC]
Groupement de Recherche et d'Etudes en Gestion à HEC [GREGH]
Journal title :
Annals of Statistics
Pages :
1516-1541
Publisher :
Institute of Mathematical Statistics
Publication date :
2013-06
ISSN :
0090-5364
English keyword(s) :
Multi-armed bandit problems
Upper confidence bound
Kullback-Leibler divergence
Sequential testing
Upper confidence bound
Kullback-Leibler divergence
Sequential testing
HAL domain(s) :
Mathématiques [math]/Probabilités [math.PR]
Mathématiques [math]/Statistiques [math.ST]
Statistiques [stat]/Théorie [stat.TH]
Informatique [cs]/Apprentissage [cs.LG]
Statistiques [stat]/Machine Learning [stat.ML]
Mathématiques [math]/Statistiques [math.ST]
Statistiques [stat]/Théorie [stat.TH]
Informatique [cs]/Apprentissage [cs.LG]
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper confidence bounds of the arm ...
Show more >We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: The kl-UCB algorithm is designed for one-parameter exponential families and the empirical KL-UCB algorithm for bounded and finitely supported distributions. Our main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins (1985) and Burnetas and Katehakis (1996), respectively. We also investigate the behavior of these algorithms when used with general bounded rewards, showing in particular that they provide significant improvements over the state-of-the-art.Show less >
Show more >We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: The kl-UCB algorithm is designed for one-parameter exponential families and the empirical KL-UCB algorithm for bounded and finitely supported distributions. Our main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins (1985) and Burnetas and Katehakis (1996), respectively. We also investigate the behavior of these algorithms when used with general bounded rewards, showing in particular that they provide significant improvements over the state-of-the-art.Show less >
Language :
Anglais
Popular science :
Non
European Project :
Collections :
Source :
Files
- https://hal.archives-ouvertes.fr/hal-00738209v2/document
- Open access
- Access the document
- http://arxiv.org/pdf/1210.1136
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-00738209v2/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-00738209v2/document
- Open access
- Access the document
- document
- Open access
- Access the document
- klucb.pdf
- Open access
- Access the document
- 1210.1136
- Open access
- Access the document