Monte-Carlo Graph Search: the Value of ...
Type de document :
Communication dans un congrès avec actes
Titre :
Monte-Carlo Graph Search: the Value of Merging Similar States
Auteur(s) :
Leurent, Edouard [Auteur]
Sequential Learning [SEQUEL]
RENAULT
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Inria Lille - Nord Europe
Maillard, Odalric Ambrym [Auteur]
Sequential Learning [SEQUEL]
Inria Lille - Nord Europe
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Sequential Learning [SEQUEL]
RENAULT
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Inria Lille - Nord Europe
Maillard, Odalric Ambrym [Auteur]
Sequential Learning [SEQUEL]
Inria Lille - Nord Europe
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Titre de la manifestation scientifique :
ACML 2020 - 12th Asian Conference on Machine Learning
Ville :
Bangkok / Virtual
Pays :
Thaïlande
Date de début de la manifestation scientifique :
2020-11-18
Date de publication :
2020
Mot(s)-clé(s) en anglais :
Online Planning
Tree-search
Reinforcement Learning
Tree-search
Reinforcement Learning
Discipline(s) HAL :
Informatique [cs]/Apprentissage [cs.LG]
Résumé en anglais : [en]
We consider the problem of planning in a Markov Decision Process (MDP) with a generative model and limited computational budget. Despite the underlying MDP transitions having a graph structure, the popular Monte-Carlo Tree ...
Lire la suite >We consider the problem of planning in a Markov Decision Process (MDP) with a generative model and limited computational budget. Despite the underlying MDP transitions having a graph structure, the popular Monte-Carlo Tree Search algorithms such as UCT rely on a tree structure to represent their value estimates. That is, they do not identify together two similar states reached via different trajectories and represented in separate branches of the tree. In this work, we propose a <i>graph-based</i> planning algorithm, which takes into account this state similarity. In our analysis, we provide a regret bound that depends on a novel problem-dependent measure of difficulty, which improves on the original tree-based bound in MDPs where the trajectories overlap, and recovers it otherwise. Then, we show that this methodology can be adapted to existing planning algorithms that deal with stochastic systems. Finally, numerical simulations illustrate the benefits of our approach.Lire moins >
Lire la suite >We consider the problem of planning in a Markov Decision Process (MDP) with a generative model and limited computational budget. Despite the underlying MDP transitions having a graph structure, the popular Monte-Carlo Tree Search algorithms such as UCT rely on a tree structure to represent their value estimates. That is, they do not identify together two similar states reached via different trajectories and represented in separate branches of the tree. In this work, we propose a <i>graph-based</i> planning algorithm, which takes into account this state similarity. In our analysis, we provide a regret bound that depends on a novel problem-dependent measure of difficulty, which improves on the original tree-based bound in MDPs where the trajectories overlap, and recovers it otherwise. Then, we show that this methodology can be adapted to existing planning algorithms that deal with stochastic systems. Finally, numerical simulations illustrate the benefits of our approach.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-03004124v2/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-03004124v2/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-03004124v2/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- main.pdf
- Accès libre
- Accéder au document