Optimistic planning in Markov decision ...
Type de document :
Communication dans un congrès avec actes
Titre :
Optimistic planning in Markov decision processes using a generative model
Auteur(s) :
Szörényi, Balázs [Auteur]
University of Szeged [Szeged]
Sequential Learning [SEQUEL]
Kedenburg, Gunnar [Auteur]
Sequential Learning [SEQUEL]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
University of Szeged [Szeged]
Sequential Learning [SEQUEL]
Kedenburg, Gunnar [Auteur]
Sequential Learning [SEQUEL]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
Titre de la manifestation scientifique :
Advances in Neural Information Processing Systems 27
Ville :
Montréal
Pays :
Canada
Date de début de la manifestation scientifique :
2014-12-08
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Informatique [cs]/Algorithme et structure de données [cs.DS]
Informatique [cs]/Algorithme et structure de données [cs.DS]
Résumé en anglais : [en]
We consider the problem of online planning in a Markov decision process with discounted rewards for any given initial state. We consider the PAC sample com-plexity problem of computing, with probability 1−δ, an -optimal ...
Lire la suite >We consider the problem of online planning in a Markov decision process with discounted rewards for any given initial state. We consider the PAC sample com-plexity problem of computing, with probability 1−δ, an -optimal action using the smallest possible number of calls to the generative model (which provides reward and next-state samples). We design an algorithm, called StOP (for Stochastic-Optimistic Planning), based on the "optimism in the face of uncertainty" princi-ple. StOP can be used in the general setting, requires only a generative model, and enjoys a complexity bound that only depends on the local structure of the MDP.Lire moins >
Lire la suite >We consider the problem of online planning in a Markov decision process with discounted rewards for any given initial state. We consider the PAC sample com-plexity problem of computing, with probability 1−δ, an -optimal action using the smallest possible number of calls to the generative model (which provides reward and next-state samples). We design an algorithm, called StOP (for Stochastic-Optimistic Planning), based on the "optimism in the face of uncertainty" princi-ple. StOP can be used in the general setting, requires only a generative model, and enjoys a complexity bound that only depends on the local structure of the MDP.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-01079366/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01079366/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- StOP_nips.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- StOP_nips.pdf
- Accès libre
- Accéder au document