Reward-free exploration beyond finite-horizon
Type de document :
Communication dans un congrès avec actes
Titre :
Reward-free exploration beyond finite-horizon
Auteur(s) :
Tarbouriech, Jean [Auteur]
Facebook AI Research [Paris] [FAIR]
Scool [Scool]
Pirotta, Matteo [Auteur]
Facebook AI Research [Paris] [FAIR]
Valko, Michal [Auteur]
DeepMind [Paris]
Lazaric, Alessandro [Auteur]
Facebook AI Research [Paris] [FAIR]
Facebook AI Research [Paris] [FAIR]
Scool [Scool]
Pirotta, Matteo [Auteur]
Facebook AI Research [Paris] [FAIR]
Valko, Michal [Auteur]
DeepMind [Paris]
Lazaric, Alessandro [Auteur]
Facebook AI Research [Paris] [FAIR]
Titre de la manifestation scientifique :
ICML 2020 Workshop on Theoretical Foundations of Reinforcement Learning
Ville :
Vienna
Pays :
France
Date de début de la manifestation scientifique :
2020
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
We consider the reward-free exploration framework introduced by Jin et al. (2020), where an RL agent interacts with an unknown environment without any explicit reward function to maximize. The objective is to collect enough ...
Lire la suite >We consider the reward-free exploration framework introduced by Jin et al. (2020), where an RL agent interacts with an unknown environment without any explicit reward function to maximize. The objective is to collect enough information during the exploration phase, so that a near-optimal policy can be immediately computed once any reward function is provided. In this paper, we move from the finite-horizon setting studied by Jin et al. (2020) to the more general setting of goalconditioned RL, often referred to as stochastic shortest path (SSP). We first discuss the challenges specific to SSPs and then study two scenarios: 1) reward-free goal-free exploration in communicating MDPs, and 2) reward-free goal-free incremental exploration in non-communicating MDPs where the agent is provided with a reset action to an initial state. In both cases, we provide exploration algorithms and their samplecomplexity bounds which we contrast with the existing guarantees in the finite-horizon case. 1Lire moins >
Lire la suite >We consider the reward-free exploration framework introduced by Jin et al. (2020), where an RL agent interacts with an unknown environment without any explicit reward function to maximize. The objective is to collect enough information during the exploration phase, so that a near-optimal policy can be immediately computed once any reward function is provided. In this paper, we move from the finite-horizon setting studied by Jin et al. (2020) to the more general setting of goalconditioned RL, often referred to as stochastic shortest path (SSP). We first discuss the challenges specific to SSPs and then study two scenarios: 1) reward-free goal-free exploration in communicating MDPs, and 2) reward-free goal-free incremental exploration in non-communicating MDPs where the agent is provided with a reset action to an initial state. In both cases, we provide exploration algorithms and their samplecomplexity bounds which we contrast with the existing guarantees in the finite-horizon case. 1Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-03288970/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-03288970/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-03288970/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-03288970/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- tarbouriech2020reward-free.pdf
- Accès libre
- Accéder au document