Practical Open-Loop Optimistic Planning

Leurent, Edouard; Maillard, Odalric Ambrym

Type de document :

Communication dans un congrès avec actes

Titre :

Practical Open-Loop Optimistic Planning

Auteur(s) :

Leurent, Edouard [Auteur]
Sequential Learning [SEQUEL]
Maillard, Odalric Ambrym [Auteur]

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]

Titre de la manifestation scientifique :

European Conference on Machine Learning

Ville :

Würzburg

Pays :

Allemagne

Date de début de la manifestation scientifique :

2019-09-16

Titre de la revue :

European Conference on Machine Learning

Discipline(s) HAL :

Mathématiques [math]/Statistiques [math.ST]
Statistiques [stat]/Machine Learning [stat.ML]

Résumé en anglais : [en]

We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies-i.e. sequences of actions-and under budget constraint. In this setting, ...
Lire la suite >We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies-i.e. sequences of actions-and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KL-OLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms.Lire moins >

Langue :

Anglais

Comité de lecture :

Oui

Audience :

Internationale

Vulgarisation :

Non

Projet ANR :

BANDITS MANCHOTS POUR SIGNAUX NON-STATIONNAIRES ET STRUCTURES

Collections :