Improved Exploration in Factored Average-Reward ...
Document type :
Communication dans un congrès avec actes
Title :
Improved Exploration in Factored Average-Reward MDPs
Author(s) :
Talebi, Sadegh [Auteur]
University of Copenhagen = Københavns Universitet [UCPH]
Jonsson, Anders [Auteur]
Universitat Pompeu Fabra [Barcelona] [UPF]
Maillard, Odalric Ambrym [Auteur]
Scool [Scool]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Inria Lille - Nord Europe
University of Copenhagen = Københavns Universitet [UCPH]
Jonsson, Anders [Auteur]
Universitat Pompeu Fabra [Barcelona] [UPF]
Maillard, Odalric Ambrym [Auteur]

Scool [Scool]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Inria Lille - Nord Europe
Conference title :
24th International Conference on Artificial Intelligence and Statistics
City :
San diego (virtual)
Country :
Etats-Unis d'Amérique
Start date of the conference :
2021
Book title :
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics
Publication date :
2021
HAL domain(s) :
Informatique [cs]/Intelligence artificielle [cs.AI]
Statistiques [stat]/Machine Learning [stat.ML]
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP). More specifically, we consider an FMDP where the stateaction space X and the state-space S ...
Show more >We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP). More specifically, we consider an FMDP where the stateaction space X and the state-space S admit the respective factored forms of X = ⊗ n i=1 X i and S = ⊗ m i=1 S i , and the transition and reward functions are factored over X and S. Assuming known factorization structure, we introduce a novel regret minimization strategy inspired by the popular UCRL2 strategy, called DBN-UCRL, which relies on Bernstein-type confidence sets defined for individual elements of the transition function. We show that for a generic factorization structure, DBN-UCRL achieves a regret bound, whose leading term strictly improves over existing regret bounds in terms of the dependencies on the size of S i 's and the involved diameter-related terms. We further show that when the factorization structure corresponds to the Cartesian product of some base MDPs, the regret of DBN-UCRL is upper bounded by the sum of regret of the base MDPs. We demonstrate, through numerical experiments on standard environments, that DBN-UCRL enjoys a substantially improved regret empirically over existing algorithms that have frequentist regret guarantees.Show less >
Show more >We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP). More specifically, we consider an FMDP where the stateaction space X and the state-space S admit the respective factored forms of X = ⊗ n i=1 X i and S = ⊗ m i=1 S i , and the transition and reward functions are factored over X and S. Assuming known factorization structure, we introduce a novel regret minimization strategy inspired by the popular UCRL2 strategy, called DBN-UCRL, which relies on Bernstein-type confidence sets defined for individual elements of the transition function. We show that for a generic factorization structure, DBN-UCRL achieves a regret bound, whose leading term strictly improves over existing regret bounds in terms of the dependencies on the size of S i 's and the involved diameter-related terms. We further show that when the factorization structure corresponds to the Cartesian product of some base MDPs, the regret of DBN-UCRL is upper bounded by the sum of regret of the base MDPs. We demonstrate, through numerical experiments on standard environments, that DBN-UCRL enjoys a substantially improved regret empirically over existing algorithms that have frequentist regret guarantees.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.archives-ouvertes.fr/hal-03780564/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-03780564/document
- Open access
- Access the document
- document
- Open access
- Access the document
- talebi21a.pdf
- Open access
- Access the document
- document
- Open access
- Access the document
- talebi21a.pdf
- Open access
- Access the document