Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation

Cappé, Olivier; Garivier, Aurélien; Maillard, Odalric Ambrym; Munos, Rémi; Stoltz, Gilles

Document type :

Compte-rendu et recension critique d'ouvrage

DOI :

10.1214/13-AOS1119

Title :

Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation

Author(s) :

Cappé, Olivier [Auteur]
Laboratoire Traitement et Communication de l'Information [LTCI]
Garivier, Aurélien [Auteur]
Institut de Mathématiques de Toulouse UMR5219 [IMT]
Maillard, Odalric Ambrym [Auteur]

Montanuniversität Leoben [MUL]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
Stoltz, Gilles [Auteur]
Computational Learning, Aggregation, Supervised Statistical, Inference, and Classification [CLASSIC]
Groupement de Recherche et d'Etudes en Gestion à HEC [GREGH]

Journal title :

Annals of Statistics

Pages :

1516-1541

Publisher :

Institute of Mathematical Statistics

Publication date :

2013-06

ISSN :

0090-5364

English keyword(s) :

Multi-armed bandit problems
Upper confidence bound
Kullback-Leibler divergence
Sequential testing

HAL domain(s) :

Mathématiques [math]/Probabilités [math.PR]
Mathématiques [math]/Statistiques [math.ST]
Statistiques [stat]/Théorie [stat.TH]
Informatique [cs]/Apprentissage [cs.LG]
Statistiques [stat]/Machine Learning [stat.ML]

English abstract : [en]

We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper confidence bounds of the arm ...
Show more >We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: The kl-UCB algorithm is designed for one-parameter exponential families and the empirical KL-UCB algorithm for bounded and finitely supported distributions. Our main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins (1985) and Burnetas and Katehakis (1996), respectively. We also investigate the behavior of these algorithms when used with general bounded rewards, showing in particular that they provide significant improvements over the state-of-the-art.Show less >

Language :

Anglais

Popular science :

Non

European Project :

Pattern Analysis, Statistical Modelling and Computational Learning 2

Collections :