• English
    • français
  • Help
  •  | 
  • Contact
  •  | 
  • About
  •  | 
  • Login
  • HAL portal
  •  | 
  • Pages Pro
  • EN
  •  / 
  • FR
View Item 
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Planning in Markov Decision Processes with ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Document type :
Communication dans un congrès avec actes
Title :
Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
Author(s) :
Jonsson, Anders [Auteur]
Kaufmann, Emilie [Auteur] refId
Scool [Scool]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]

Ménard, Pierre [Auteur]
Scool [Scool]
Domingues, Omar [Auteur]
Scool [Scool]
Leurent, Edouard [Auteur]
Scool [Scool]
RENAULT
Valko, Michal [Auteur] refId
DeepMind [Paris]
Conference title :
Neural Information Processing Systems
City :
Vancouver
Country :
France
Start date of the conference :
2020
Publication date :
2020-12-07
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the ...
Show more >
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration. Our experiments reveal that MDP-GapE is also effective in practice, in contrast with other algorithms with sample complexity guarantees in the fixed-confidence setting, that are mostly theoretical.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
ANR Project :
Au delà de l'apprentissage séquentiel pour de meilleures prises de décisions
Collections :
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Source :
Harvested from HAL
Files
Thumbnail
  • https://hal.archives-ouvertes.fr/hal-02863486v2/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.archives-ouvertes.fr/hal-02863486v2/file/budget.pdf
  • Open access
  • Access the document
Thumbnail
  • https://hal.archives-ouvertes.fr/hal-02863486v2/file/simple_regret.pdf
  • Open access
  • Access the document
Thumbnail
  • https://hal.archives-ouvertes.fr/hal-02863486v2/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.archives-ouvertes.fr/hal-02863486v2/document
  • Open access
  • Access the document
Université de Lille

Mentions légales
Université de Lille © 2017