Improved Exploration in Factored Average-Reward ...
Type de document :
Communication dans un congrès avec actes
Titre :
Improved Exploration in Factored Average-Reward MDPs
Auteur(s) :
Talebi, Sadegh [Auteur]
University of Copenhagen = Københavns Universitet [UCPH]
Jonsson, Anders [Auteur]
Universitat Pompeu Fabra [Barcelona] [UPF]
Maillard, Odalric Ambrym [Auteur]
Scool [Scool]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Inria Lille - Nord Europe
University of Copenhagen = Københavns Universitet [UCPH]
Jonsson, Anders [Auteur]
Universitat Pompeu Fabra [Barcelona] [UPF]
Maillard, Odalric Ambrym [Auteur]

Scool [Scool]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Inria Lille - Nord Europe
Titre de la manifestation scientifique :
24th International Conference on Artificial Intelligence and Statistics
Ville :
San diego (virtual)
Pays :
Etats-Unis d'Amérique
Date de début de la manifestation scientifique :
2021
Titre de l’ouvrage :
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics
Date de publication :
2021
Discipline(s) HAL :
Informatique [cs]/Intelligence artificielle [cs.AI]
Statistiques [stat]/Machine Learning [stat.ML]
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP). More specifically, we consider an FMDP where the stateaction space X and the state-space S ...
Lire la suite >We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP). More specifically, we consider an FMDP where the stateaction space X and the state-space S admit the respective factored forms of X = ⊗ n i=1 X i and S = ⊗ m i=1 S i , and the transition and reward functions are factored over X and S. Assuming known factorization structure, we introduce a novel regret minimization strategy inspired by the popular UCRL2 strategy, called DBN-UCRL, which relies on Bernstein-type confidence sets defined for individual elements of the transition function. We show that for a generic factorization structure, DBN-UCRL achieves a regret bound, whose leading term strictly improves over existing regret bounds in terms of the dependencies on the size of S i 's and the involved diameter-related terms. We further show that when the factorization structure corresponds to the Cartesian product of some base MDPs, the regret of DBN-UCRL is upper bounded by the sum of regret of the base MDPs. We demonstrate, through numerical experiments on standard environments, that DBN-UCRL enjoys a substantially improved regret empirically over existing algorithms that have frequentist regret guarantees.Lire moins >
Lire la suite >We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP). More specifically, we consider an FMDP where the stateaction space X and the state-space S admit the respective factored forms of X = ⊗ n i=1 X i and S = ⊗ m i=1 S i , and the transition and reward functions are factored over X and S. Assuming known factorization structure, we introduce a novel regret minimization strategy inspired by the popular UCRL2 strategy, called DBN-UCRL, which relies on Bernstein-type confidence sets defined for individual elements of the transition function. We show that for a generic factorization structure, DBN-UCRL achieves a regret bound, whose leading term strictly improves over existing regret bounds in terms of the dependencies on the size of S i 's and the involved diameter-related terms. We further show that when the factorization structure corresponds to the Cartesian product of some base MDPs, the regret of DBN-UCRL is upper bounded by the sum of regret of the base MDPs. We demonstrate, through numerical experiments on standard environments, that DBN-UCRL enjoys a substantially improved regret empirically over existing algorithms that have frequentist regret guarantees.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-03780564/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-03780564/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- talebi21a.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- talebi21a.pdf
- Accès libre
- Accéder au document