Non-Asymptotic Pure Exploration by Solving Games
Type de document :
Pré-publication ou Document de travail
Titre :
Non-Asymptotic Pure Exploration by Solving Games
Auteur(s) :
Degenne, Rémy [Auteur]
Centre de Mathématiques et de Leurs Applications [CMLA]
Koolen, Wouter [Auteur]
Centrum Wiskunde & Informatica [CWI]
Ménard, Pierre [Auteur]
Sequential Learning [SEQUEL]
Centre de Mathématiques et de Leurs Applications [CMLA]
Koolen, Wouter [Auteur]
Centrum Wiskunde & Informatica [CWI]
Ménard, Pierre [Auteur]
Sequential Learning [SEQUEL]
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistakes and take few samples. Lower bounds ...
Lire la suite >Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistakes and take few samples. Lower bounds (for multi-armed bandit models with arms in an exponential family) reveal that the sample complexity is determined by the solution to an optimisation problem. The existing state of the art algorithms achieve asymptotic optimality by solving a plug-in estimate of that optimisation problem at each step. We interpret the optimisation problem as an unknown game, and propose sampling rules based on iterative strategies to estimate and converge to its saddle point. We apply no-regret learners to obtain the first finite confidence guarantees that are adapted to the exponential family and which apply to any pure exploration query and bandit structure. Moreover, our algorithms only use a best response oracle instead of fully solving the optimisation problem.Lire moins >
Lire la suite >Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistakes and take few samples. Lower bounds (for multi-armed bandit models with arms in an exponential family) reveal that the sample complexity is determined by the solution to an optimisation problem. The existing state of the art algorithms achieve asymptotic optimality by solving a plug-in estimate of that optimisation problem at each step. We interpret the optimisation problem as an unknown game, and propose sampling rules based on iterative strategies to estimate and converge to its saddle point. We apply no-regret learners to obtain the first finite confidence guarantees that are adapted to the exponential family and which apply to any pure exploration query and bandit structure. Moreover, our algorithms only use a best response oracle instead of fully solving the optimisation problem.Lire moins >
Langue :
Anglais
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-02402665/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-02402665/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-02402665/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- main.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- main.pdf
- Accès libre
- Accéder au document