Kullback-Leibler Upper Confidence Bounds ...
Type de document :
Compte-rendu et recension critique d'ouvrage
DOI :
Titre :
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation
Auteur(s) :
Cappé, Olivier [Auteur]
Laboratoire Traitement et Communication de l'Information [LTCI]
Garivier, Aurélien [Auteur]
Institut de Mathématiques de Toulouse UMR5219 [IMT]
Maillard, Odalric Ambrym [Auteur]
Montanuniversität Leoben [MUL]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
Stoltz, Gilles [Auteur]
Computational Learning, Aggregation, Supervised Statistical, Inference, and Classification [CLASSIC]
Groupement de Recherche et d'Etudes en Gestion à HEC [GREGH]
Laboratoire Traitement et Communication de l'Information [LTCI]
Garivier, Aurélien [Auteur]
Institut de Mathématiques de Toulouse UMR5219 [IMT]
Maillard, Odalric Ambrym [Auteur]
Montanuniversität Leoben [MUL]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
Stoltz, Gilles [Auteur]
Computational Learning, Aggregation, Supervised Statistical, Inference, and Classification [CLASSIC]
Groupement de Recherche et d'Etudes en Gestion à HEC [GREGH]
Titre de la revue :
Annals of Statistics
Pagination :
1516-1541
Éditeur :
Institute of Mathematical Statistics
Date de publication :
2013-06
ISSN :
0090-5364
Mot(s)-clé(s) en anglais :
Multi-armed bandit problems
Upper confidence bound
Kullback-Leibler divergence
Sequential testing
Upper confidence bound
Kullback-Leibler divergence
Sequential testing
Discipline(s) HAL :
Mathématiques [math]/Probabilités [math.PR]
Mathématiques [math]/Statistiques [math.ST]
Statistiques [stat]/Théorie [stat.TH]
Informatique [cs]/Apprentissage [cs.LG]
Statistiques [stat]/Machine Learning [stat.ML]
Mathématiques [math]/Statistiques [math.ST]
Statistiques [stat]/Théorie [stat.TH]
Informatique [cs]/Apprentissage [cs.LG]
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper confidence bounds of the arm ...
Lire la suite >We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: The kl-UCB algorithm is designed for one-parameter exponential families and the empirical KL-UCB algorithm for bounded and finitely supported distributions. Our main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins (1985) and Burnetas and Katehakis (1996), respectively. We also investigate the behavior of these algorithms when used with general bounded rewards, showing in particular that they provide significant improvements over the state-of-the-art.Lire moins >
Lire la suite >We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: The kl-UCB algorithm is designed for one-parameter exponential families and the empirical KL-UCB algorithm for bounded and finitely supported distributions. Our main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins (1985) and Burnetas and Katehakis (1996), respectively. We also investigate the behavior of these algorithms when used with general bounded rewards, showing in particular that they provide significant improvements over the state-of-the-art.Lire moins >
Langue :
Anglais
Vulgarisation :
Non
Projet Européen :
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-00738209v2/document
- Accès libre
- Accéder au document
- http://arxiv.org/pdf/1210.1136
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-00738209v2/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-00738209v2/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- klucb.pdf
- Accès libre
- Accéder au document
- 1210.1136
- Accès libre
- Accéder au document