Sample complexity bounds for stochastic shortest path with a generative model

Tarbouriech, Jean; Pirotta, Matteo; Valko, Michal; Lazaric, Alessandro

Type de document :

Communication dans un congrès avec actes

Titre :

Sample complexity bounds for stochastic shortest path with a generative model

Auteur(s) :

Tarbouriech, Jean [Auteur]
Scool [Scool]
Facebook AI Research [Paris] [FAIR]
Pirotta, Matteo [Auteur]
Facebook AI Research [Paris] [FAIR]
Valko, Michal [Auteur]

DeepMind [Paris]
Lazaric, Alessandro [Auteur]
Facebook AI Research [Paris] [FAIR]

Titre de la manifestation scientifique :

Algorithmic Learning Theory

Ville :

Paris

Pays :

France

Date de début de la manifestation scientifique :

2021

Mot(s)-clé(s) en anglais :

sample complexity
stochastic shortest path
Markov decision process

Discipline(s) HAL :

Statistiques [stat]/Machine Learning [stat.ML]

Résumé en anglais : [en]

We consider the objective of computing an ε-optimal policy in a stochastic shortest path (SSP) setting, provided that we can access a generative sampling oracle. We propose two algorithms for this setting and derive PAC ...
Lire la suite >We consider the objective of computing an ε-optimal policy in a stochastic shortest path (SSP) setting, provided that we can access a generative sampling oracle. We propose two algorithms for this setting and derive PAC bounds on their sample complexity: one for the case of positive costs and the other for the case of non-negative costs under a restricted optimality criterion. While tight sample complexity bounds have been derived for the finite-horizon and discounted MDPs, the SSP problem is a strict generalization of these settings and it poses additional technical challenges due to the fact that no specific time horizon is prescribed and policies may never terminate, i.e., we are possibly facing non-proper policies. As a consequence, we can neither directly apply existing techniques minimizing sample complexity nor rely on a regret-to-PAC conversion leveraging recent regret bounds for SSP. Our analysis instead combines SSP-specific tools and variance reduction techniques to obtain the first sample complexity bounds for this setting.Lire moins >

Langue :

Anglais

Comité de lecture :

Oui

Audience :

Internationale

Vulgarisation :

Non

Collections :

Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189

Source :

Harvested from HAL

Fichiers

https://hal.inria.fr/hal-03288988/document
Accès libre
Accéder au document

https://hal.inria.fr/hal-03288988/document
Accès libre
Accéder au document

https://hal.inria.fr/hal-03288988/document
Accès libre
Accéder au document

https://hal.inria.fr/hal-03288988/document
Accès libre
Accéder au document

document
Accès libre
Accéder au document

tarbouriech2021sample.pdf
Accès libre
Accéder au document

document
Accès libre
Accéder au document

tarbouriech2021sample.pdf
Accès libre
Accéder au document

Sample complexity bounds for stochastic ... BibTeX CSV Excel RIS

Fichiers

Sample complexity bounds for stochastic ...

BibTeX

CSV

Excel

RIS