Aggregating optimistic planning trees for solving markov decision processes

Kedenburg, Gunnar; Fonteneau, Raphael; Munos, Remi

Type de document :

Communication dans un congrès avec actes

Titre :

Aggregating optimistic planning trees for solving markov decision processes

Auteur(s) :

Kedenburg, Gunnar [Auteur]
Sequential Learning [SEQUEL]
Fonteneau, Raphael [Auteur]
Université de Liège
Sequential Learning [SEQUEL]
Munos, Remi [Auteur]
Sequential Learning [SEQUEL]

Titre de la manifestation scientifique :

Advances in Neural Information Processing Systems

Pays :

Etats-Unis d'Amérique

Date de début de la manifestation scientifique :

2013

Titre de l’ouvrage :

Advances in Neural Information Processing Systems

Date de publication :

2013

Discipline(s) HAL :

Informatique [cs]/Apprentissage [cs.LG]

Résumé en anglais : [en]

This paper addresses the problem of online planning in Markov decision processes using a generative model and under a budget constraint. We propose a new algorithm, ASOP, which is based on the construction of a forest of ...
Lire la suite >This paper addresses the problem of online planning in Markov decision processes using a generative model and under a budget constraint. We propose a new algorithm, ASOP, which is based on the construction of a forest of single successor state planning trees, where each tree corresponds to a random realization of the stochastic environment. The trees are explored using a "safe" optimistic planning strategy which combines the optimistic principle (in order to explore the most promising part of the search space first) and a safety principle (which guarantees a certain amount of uniform exploration). In the decision-making step of the algorithm, the individual trees are aggregated and an immediate action is recommended. We provide a finite-sample analysis and discuss the trade-off between the principles of optimism and safety. We report numerical results on a benchmark problem showing that ASOP performs as well as state-of-the-art optimistic planning algorithms.Lire moins >

Langue :

Anglais

Comité de lecture :

Oui

Audience :

Internationale

Vulgarisation :

Non

Collections :

Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189

Source :

Harvested from HAL

Fichiers

https://hal.archives-ouvertes.fr/hal-00923681/document
Accès libre
Accéder au document

https://hal.archives-ouvertes.fr/hal-00923681/document
Accès libre
Accéder au document

https://hal.archives-ouvertes.fr/hal-00923681/document
Accès libre
Accéder au document

document
Accès libre
Accéder au document

nips13a.pdf
Accès libre
Accéder au document

Aggregating optimistic planning trees for ... BibTeX CSV Excel RIS

Fichiers

Aggregating optimistic planning trees for ...

BibTeX

CSV

Excel

RIS