• English
    • français
  • Help
  •  | 
  • Contact
  •  | 
  • About
  •  | 
  • Login
  • HAL portal
  •  | 
  • Pages Pro
  • EN
  •  / 
  • FR
View Item 
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Kernel-Based Reinforcement Learning: A ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Document type :
Communication dans un congrès avec actes
Title :
Kernel-Based Reinforcement Learning: A Finite-Time Analysis
Author(s) :
Domingues, Omar [Auteur]
Scool [Scool]
Ménard, Pierre [Auteur]
Scool [Scool]
Pirotta, Matteo [Auteur]
Facebook AI Research [Paris] [FAIR]
Kaufmann, Emilie [Auteur] refId
Scool [Scool]

Valko, Michal [Auteur]
DeepMind [Paris]
Conference title :
International Conference on Machine Learning (ICML)
City :
virtual
Country :
Autriche
Start date of the conference :
2021-07
Journal title :
Proceedings of Machine Learning Research (PMLR)
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that ...
Show more >
We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with K episodes and horizon H, we provide a regret bound of O H 3 K 2d 2d+1 , where d is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and has been previously applied to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
ANR Project :
Au delà de l'apprentissage séquentiel pour de meilleures prises de décisions
Collections :
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Source :
Harvested from HAL
Files
Thumbnail
  • https://hal.archives-ouvertes.fr/hal-03827244/document
  • Open access
  • Access the document
Thumbnail
  • http://arxiv.org/pdf/2004.05599
  • Open access
  • Access the document
Université de Lille

Mentions légales
Université de Lille © 2017