Online Stochastic Optimization under ...
Type de document :
Communication dans un congrès avec actes
Titre :
Online Stochastic Optimization under Correlated Bandit Feedback
Auteur(s) :
Gheshlaghi Azar, Mohammad [Auteur]
Northwestern University [Evanston]
Lazaric, Alessandro [Auteur]
Sequential Learning [SEQUEL]
Brunskill, Emma [Auteur]
Computer Science Department - Carnegie Mellon University
Northwestern University [Evanston]
Lazaric, Alessandro [Auteur]
Sequential Learning [SEQUEL]
Brunskill, Emma [Auteur]
Computer Science Department - Carnegie Mellon University
Titre de la manifestation scientifique :
31st International Conference on Machine Learning
Ville :
Beijing
Pays :
Chine
Date de début de la manifestation scientifique :
2014-06
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
In this paper we consider the problem of online stochastic optimization of a locally smooth func-tion under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel anytime X -armed bandit algorithm, ...
Lire la suite >In this paper we consider the problem of online stochastic optimization of a locally smooth func-tion under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel anytime X -armed bandit algorithm, and derive regret bounds matching the performance of state-of-the-art algorithms in terms of the dependency on number of steps and the near-optimality di-mension. The main advantage of HCT is that it handles the challenging case of correlated ban-dit feedback (reward), whereas existing meth-ods require rewards to be conditionally indepen-dent. HCT also improves on the state-of-the-art in terms of the memory requirement, as well as requiring a weaker smoothness assumption on the mean-reward function in comparison with the existing anytime algorithms. Finally, we discuss how HCT can be applied to the problem of policy search in reinforcement learning and we report preliminary empirical results.Lire moins >
Lire la suite >In this paper we consider the problem of online stochastic optimization of a locally smooth func-tion under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel anytime X -armed bandit algorithm, and derive regret bounds matching the performance of state-of-the-art algorithms in terms of the dependency on number of steps and the near-optimality di-mension. The main advantage of HCT is that it handles the challenging case of correlated ban-dit feedback (reward), whereas existing meth-ods require rewards to be conditionally indepen-dent. HCT also improves on the state-of-the-art in terms of the memory requirement, as well as requiring a weaker smoothness assumption on the mean-reward function in comparison with the existing anytime algorithms. Finally, we discuss how HCT can be applied to the problem of policy search in reinforcement learning and we report preliminary empirical results.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Projet Européen :
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-01080138/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01080138/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01080138/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- paper%20%281%29.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- paper%20%281%29.pdf
- Accès libre
- Accéder au document