Maximin Action Identification: A New Bandit ...
Document type :
Communication dans un congrès avec actes
Title :
Maximin Action Identification: A New Bandit Framework for Games
Author(s) :
Garivier, Aurélien [Auteur]
Institut de Mathématiques de Toulouse UMR5219 [IMT]
Kaufmann, Emilie [Auteur]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Centre National de la Recherche Scientifique [CNRS]
Koolen, Wouter [Auteur]
Centrum Wiskunde & Informatica [CWI]
Institut de Mathématiques de Toulouse UMR5219 [IMT]
Kaufmann, Emilie [Auteur]

Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Centre National de la Recherche Scientifique [CNRS]
Koolen, Wouter [Auteur]
Centrum Wiskunde & Informatica [CWI]
Conference title :
29th Annual Conference on Learning Theory (COLT)
City :
New-York
Country :
Etats-Unis d'Amérique
Start date of the conference :
2016-06-23
Journal title :
JMLR Workshop and Conference Proceedings
English keyword(s) :
racing
LUCB
multi-armed bandit problems
games
best-arm identification
LUCB
multi-armed bandit problems
games
best-arm identification
HAL domain(s) :
Mathématiques [math]/Statistiques [math.ST]
Statistiques [stat]/Machine Learning [stat.ML]
Informatique [cs]/Informatique et théorie des jeux [cs.GT]
Statistiques [stat]/Machine Learning [stat.ML]
Informatique [cs]/Informatique et théorie des jeux [cs.GT]
English abstract : [en]
We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of ...
Show more >We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.Show less >
Show more >We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
ANR Project :
Collections :
Source :
Files
- https://hal.archives-ouvertes.fr/hal-01273842v2/document
- Open access
- Access the document
- http://arxiv.org/pdf/1602.04676
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-01273842v2/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-01273842v2/document
- Open access
- Access the document
- document
- Open access
- Access the document
- garivier16b.pdf
- Open access
- Access the document
- 1602.04676
- Open access
- Access the document