• English
    • français
  • Help
  •  | 
  • Contact
  •  | 
  • About
  •  | 
  • Login
  • HAL portal
  •  | 
  • Pages Pro
  • EN
  •  / 
  • FR
View Item 
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Sample complexity bounds for stochastic ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Document type :
Communication dans un congrès avec actes
Title :
Sample complexity bounds for stochastic shortest path with a generative model
Author(s) :
Tarbouriech, Jean [Auteur]
Scool [Scool]
Facebook AI Research [Paris] [FAIR]
Pirotta, Matteo [Auteur]
Facebook AI Research [Paris] [FAIR]
Valko, Michal [Auteur] refId
DeepMind [Paris]
Lazaric, Alessandro [Auteur]
Facebook AI Research [Paris] [FAIR]
Conference title :
Algorithmic Learning Theory
City :
Paris
Country :
France
Start date of the conference :
2021
English keyword(s) :
sample complexity
stochastic shortest path
Markov decision process
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
We consider the objective of computing an ε-optimal policy in a stochastic shortest path (SSP) setting, provided that we can access a generative sampling oracle. We propose two algorithms for this setting and derive PAC ...
Show more >
We consider the objective of computing an ε-optimal policy in a stochastic shortest path (SSP) setting, provided that we can access a generative sampling oracle. We propose two algorithms for this setting and derive PAC bounds on their sample complexity: one for the case of positive costs and the other for the case of non-negative costs under a restricted optimality criterion. While tight sample complexity bounds have been derived for the finite-horizon and discounted MDPs, the SSP problem is a strict generalization of these settings and it poses additional technical challenges due to the fact that no specific time horizon is prescribed and policies may never terminate, i.e., we are possibly facing non-proper policies. As a consequence, we can neither directly apply existing techniques minimizing sample complexity nor rely on a regret-to-PAC conversion leveraging recent regret bounds for SSP. Our analysis instead combines SSP-specific tools and variance reduction techniques to obtain the first sample complexity bounds for this setting.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Source :
Harvested from HAL
Files
Thumbnail
  • https://hal.inria.fr/hal-03288988/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-03288988/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-03288988/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-03288988/document
  • Open access
  • Access the document
Université de Lille

Mentions légales
Université de Lille © 2017