Improved sample complexity for incremental ...
Type de document :
Communication dans un congrès avec actes
Titre :
Improved sample complexity for incremental autonomous exploration in MDPs
Auteur(s) :
Tarbouriech, Jean [Auteur]
Scool [Scool]
Facebook AI Research [Paris] [FAIR]
Pirotta, Matteo [Auteur]
Facebook AI Research [Paris] [FAIR]
Valko, Michal [Auteur]
DeepMind [Paris]
Lazaric, Alessandro [Auteur]
Facebook AI Research [Paris] [FAIR]
Scool [Scool]
Facebook AI Research [Paris] [FAIR]
Pirotta, Matteo [Auteur]
Facebook AI Research [Paris] [FAIR]
Valko, Michal [Auteur]
DeepMind [Paris]
Lazaric, Alessandro [Auteur]
Facebook AI Research [Paris] [FAIR]
Titre de la manifestation scientifique :
Neural Information Processing Systems
Ville :
Montréal
Pays :
Canada
Date de début de la manifestation scientifique :
2020
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
We investigate the exploration of an unknown environment when no reward function is provided. Building on the incremental exploration setting introduced by Lim and Auer [1], we define the objective of learning the set of ...
Lire la suite >We investigate the exploration of an unknown environment when no reward function is provided. Building on the incremental exploration setting introduced by Lim and Auer [1], we define the objective of learning the set of ε-optimal goal-conditioned policies attaining all states that are incrementally reachable within L steps (in expectation) from a reference state s 0. In this paper, we introduce a novel modelbased approach that interleaves discovering new states from s 0 and improving the accuracy of a model estimate that is used to compute goal-conditioned policies to reach newly discovered states. The resulting algorithm, DisCo, achieves a sample complexity scaling as O(L 5 S L+ε Γ L+ε A ε −2), where A is the number of actions, S L+ε is the number of states that are incrementally reachable from s 0 in L + ε steps, and Γ L+ε is the branching factor of the dynamics over such states. This improves over the algorithm proposed in [1] in both ε and L at the cost of an extra Γ L+ε factor, which is small in most environments of interest. Furthermore, DISCO is the first algorithm that can return an ε/c min-optimal policy for any cost-sensitive shortest-path problem defined on the L-reachable states with minimum cost c min. Finally, we report preliminary empirical results confirming our theoretical findings.Lire moins >
Lire la suite >We investigate the exploration of an unknown environment when no reward function is provided. Building on the incremental exploration setting introduced by Lim and Auer [1], we define the objective of learning the set of ε-optimal goal-conditioned policies attaining all states that are incrementally reachable within L steps (in expectation) from a reference state s 0. In this paper, we introduce a novel modelbased approach that interleaves discovering new states from s 0 and improving the accuracy of a model estimate that is used to compute goal-conditioned policies to reach newly discovered states. The resulting algorithm, DisCo, achieves a sample complexity scaling as O(L 5 S L+ε Γ L+ε A ε −2), where A is the number of actions, S L+ε is the number of states that are incrementally reachable from s 0 in L + ε steps, and Γ L+ε is the branching factor of the dynamics over such states. This improves over the algorithm proposed in [1] in both ε and L at the cost of an extra Γ L+ε factor, which is small in most environments of interest. Furthermore, DISCO is the first algorithm that can return an ε/c min-optimal policy for any cost-sensitive shortest-path problem defined on the L-reachable states with minimum cost c min. Finally, we report preliminary empirical results confirming our theoretical findings.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-03287829/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-03287829/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-03287829/document
- Accès libre
- Accéder au document
- tarbouriech2020improved.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document