Kernel-Based Reinforcement Learning: A ...
Type de document :
Communication dans un congrès avec actes
Titre :
Kernel-Based Reinforcement Learning: A Finite-Time Analysis
Auteur(s) :
Domingues, Omar [Auteur]
Scool [Scool]
Ménard, Pierre [Auteur]
Scool [Scool]
Pirotta, Matteo [Auteur]
Facebook AI Research [Paris] [FAIR]
Kaufmann, Emilie [Auteur]
Scool [Scool]
Centre National de la Recherche Scientifique [CNRS]
Valko, Michal [Auteur]
DeepMind [Paris]
Scool [Scool]
Ménard, Pierre [Auteur]
Scool [Scool]
Pirotta, Matteo [Auteur]
Facebook AI Research [Paris] [FAIR]
Kaufmann, Emilie [Auteur]

Scool [Scool]
Centre National de la Recherche Scientifique [CNRS]
Valko, Michal [Auteur]
DeepMind [Paris]
Titre de la manifestation scientifique :
International Conference on Machine Learning (ICML)
Ville :
virtual
Pays :
Autriche
Date de début de la manifestation scientifique :
2021-07
Titre de la revue :
Proceedings of Machine Learning Research (PMLR)
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that ...
Lire la suite >We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with K episodes and horizon H, we provide a regret bound of O H 3 K 2d 2d+1 , where d is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and has been previously applied to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards.Lire moins >
Lire la suite >We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with K episodes and horizon H, we provide a regret bound of O H 3 K 2d 2d+1 , where d is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and has been previously applied to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-03827244/document
- Accès libre
- Accéder au document
- http://arxiv.org/pdf/2004.05599
- Accès libre
- Accéder au document