• English
    • français
  • Help
  •  | 
  • Contact
  •  | 
  • About
  •  | 
  • Login
  • HAL portal
  •  | 
  • Pages Pro
  • EN
  •  / 
  • FR
View Item 
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Online Stochastic Optimization under ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Document type :
Communication dans un congrès avec actes
Title :
Online Stochastic Optimization under Correlated Bandit Feedback
Author(s) :
Gheshlaghi Azar, Mohammad [Auteur]
Northwestern University [Chicago, Ill. USA]
Lazaric, Alessandro [Auteur]
Sequential Learning [SEQUEL]
Brunskill, Emma [Auteur]
Computer Science Department - Carnegie Mellon University
Conference title :
31st International Conference on Machine Learning
City :
Beijing
Country :
Chine
Start date of the conference :
2014-06
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
In this paper we consider the problem of online stochastic optimization of a locally smooth func-tion under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel anytime X -armed bandit algorithm, ...
Show more >
In this paper we consider the problem of online stochastic optimization of a locally smooth func-tion under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel anytime X -armed bandit algorithm, and derive regret bounds matching the performance of state-of-the-art algorithms in terms of the dependency on number of steps and the near-optimality di-mension. The main advantage of HCT is that it handles the challenging case of correlated ban-dit feedback (reward), whereas existing meth-ods require rewards to be conditionally indepen-dent. HCT also improves on the state-of-the-art in terms of the memory requirement, as well as requiring a weaker smoothness assumption on the mean-reward function in comparison with the existing anytime algorithms. Finally, we discuss how HCT can be applied to the problem of policy search in reinforcement learning and we report preliminary empirical results.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
European Project :
Composing Learning for Artificial Cognitive Systems
Collections :
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Source :
Harvested from HAL
Files
Thumbnail
  • https://hal.inria.fr/hal-01080138/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-01080138/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-01080138/document
  • Open access
  • Access the document
Université de Lille

Mentions légales
Université de Lille © 2017