• English
    • français
  • Help
  •  | 
  • Contact
  •  | 
  • About
  •  | 
  • Login
  • HAL portal
  •  | 
  • Pages Pro
  • EN
  •  / 
  • FR
View Item 
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Optimistic planning in Markov decision ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Document type :
Communication dans un congrès avec actes
Title :
Optimistic planning in Markov decision processes using a generative model
Author(s) :
Szörényi, Balázs [Auteur]
University of Szeged [Szeged]
Sequential Learning [SEQUEL]
Kedenburg, Gunnar [Auteur]
Sequential Learning [SEQUEL]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
Conference title :
Advances in Neural Information Processing Systems 27
City :
Montréal
Country :
Canada
Start date of the conference :
2014-12-08
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
Informatique [cs]/Algorithme et structure de données [cs.DS]
English abstract : [en]
We consider the problem of online planning in a Markov decision process with discounted rewards for any given initial state. We consider the PAC sample com-plexity problem of computing, with probability 1−δ, an -optimal ...
Show more >
We consider the problem of online planning in a Markov decision process with discounted rewards for any given initial state. We consider the PAC sample com-plexity problem of computing, with probability 1−δ, an -optimal action using the smallest possible number of calls to the generative model (which provides reward and next-state samples). We design an algorithm, called StOP (for Stochastic-Optimistic Planning), based on the "optimism in the face of uncertainty" princi-ple. StOP can be used in the general setting, requires only a generative model, and enjoys a complexity bound that only depends on the local structure of the MDP.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Source :
Harvested from HAL
Files
Thumbnail
  • https://hal.inria.fr/hal-01079366/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-01079366/document
  • Open access
  • Access the document
Université de Lille

Mentions légales
Université de Lille © 2017