Aggregating optimistic planning trees for ...
Type de document :
Communication dans un congrès avec actes
Titre :
Aggregating optimistic planning trees for solving markov decision processes
Auteur(s) :
Kedenburg, Gunnar [Auteur]
Sequential Learning [SEQUEL]
Fonteneau, Raphael [Auteur]
Université de Liège
Sequential Learning [SEQUEL]
Munos, Remi [Auteur]
Sequential Learning [SEQUEL]
Sequential Learning [SEQUEL]
Fonteneau, Raphael [Auteur]
Université de Liège
Sequential Learning [SEQUEL]
Munos, Remi [Auteur]
Sequential Learning [SEQUEL]
Titre de la manifestation scientifique :
Advances in Neural Information Processing Systems
Pays :
Etats-Unis d'Amérique
Date de début de la manifestation scientifique :
2013
Titre de l’ouvrage :
Advances in Neural Information Processing Systems
Date de publication :
2013
Discipline(s) HAL :
Informatique [cs]/Apprentissage [cs.LG]
Résumé en anglais : [en]
This paper addresses the problem of online planning in Markov decision processes using a generative model and under a budget constraint. We propose a new algorithm, ASOP, which is based on the construction of a forest of ...
Lire la suite >This paper addresses the problem of online planning in Markov decision processes using a generative model and under a budget constraint. We propose a new algorithm, ASOP, which is based on the construction of a forest of single successor state planning trees, where each tree corresponds to a random realization of the stochastic environment. The trees are explored using a "safe" optimistic planning strategy which combines the optimistic principle (in order to explore the most promising part of the search space first) and a safety principle (which guarantees a certain amount of uniform exploration). In the decision-making step of the algorithm, the individual trees are aggregated and an immediate action is recommended. We provide a finite-sample analysis and discuss the trade-off between the principles of optimism and safety. We report numerical results on a benchmark problem showing that ASOP performs as well as state-of-the-art optimistic planning algorithms.Lire moins >
Lire la suite >This paper addresses the problem of online planning in Markov decision processes using a generative model and under a budget constraint. We propose a new algorithm, ASOP, which is based on the construction of a forest of single successor state planning trees, where each tree corresponds to a random realization of the stochastic environment. The trees are explored using a "safe" optimistic planning strategy which combines the optimistic principle (in order to explore the most promising part of the search space first) and a safety principle (which guarantees a certain amount of uniform exploration). In the decision-making step of the algorithm, the individual trees are aggregated and an immediate action is recommended. We provide a finite-sample analysis and discuss the trade-off between the principles of optimism and safety. We report numerical results on a benchmark problem showing that ASOP performs as well as state-of-the-art optimistic planning algorithms.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-00923681/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-00923681/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-00923681/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- nips13a.pdf
- Accès libre
- Accéder au document