Kernel-Based Reinforcement Learning: A ...
Document type :
Communication dans un congrès avec actes
Title :
Kernel-Based Reinforcement Learning: A Finite-Time Analysis
Author(s) :
Domingues, Omar [Auteur]
Scool [Scool]
Ménard, Pierre [Auteur]
Scool [Scool]
Pirotta, Matteo [Auteur]
Facebook AI Research [Paris] [FAIR]
Kaufmann, Emilie [Auteur]
Scool [Scool]
Valko, Michal [Auteur]
DeepMind [Paris]
Scool [Scool]
Ménard, Pierre [Auteur]
Scool [Scool]
Pirotta, Matteo [Auteur]
Facebook AI Research [Paris] [FAIR]
Kaufmann, Emilie [Auteur]

Scool [Scool]
Valko, Michal [Auteur]
DeepMind [Paris]
Conference title :
International Conference on Machine Learning (ICML)
City :
virtual
Country :
Autriche
Start date of the conference :
2021-07
Journal title :
Proceedings of Machine Learning Research (PMLR)
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that ...
Show more >We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with K episodes and horizon H, we provide a regret bound of O H 3 K 2d 2d+1 , where d is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and has been previously applied to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards.Show less >
Show more >We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with K episodes and horizon H, we provide a regret bound of O H 3 K 2d 2d+1 , where d is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and has been previously applied to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.archives-ouvertes.fr/hal-03827244/document
- Open access
- Access the document
- http://arxiv.org/pdf/2004.05599
- Open access
- Access the document