Optimistic PAC Reinforcement Learning: the ...
Document type :
Communication dans un congrès avec actes
Title :
Optimistic PAC Reinforcement Learning: the Instance-Dependent View
Author(s) :
Tirinzoni, Andrea [Auteur]
Meta AI Research [Paris]
Al-Marjani, Aymen [Auteur]
Unité de Mathématiques Pures et Appliquées [UMPA-ENSL]
Kaufmann, Emilie [Auteur]
Scool [Scool]
Centre National de la Recherche Scientifique [CNRS]
Meta AI Research [Paris]
Al-Marjani, Aymen [Auteur]
Unité de Mathématiques Pures et Appliquées [UMPA-ENSL]
Kaufmann, Emilie [Auteur]

Scool [Scool]
Centre National de la Recherche Scientifique [CNRS]
Conference title :
Algorithmic Learning Theory (ALT)
City :
Singapore (SG)
Country :
Singapour
Start date of the conference :
2023-02-20
Book title :
Proceedings of Machine Learning Research (PMLR)
HAL domain(s) :
Statistiques [stat]/Autres [stat.ML]
English abstract : [en]
Optimistic algorithms have been extensively studied for regret minimization in episodic tabular Markov Decision Processes (MDPs), both from a minimax and an instance-dependent view. However, for the PAC RL problem, where ...
Show more >Optimistic algorithms have been extensively studied for regret minimization in episodic tabular Markov Decision Processes (MDPs), both from a minimax and an instance-dependent view. However, for the PAC RL problem, where the goal is to identify a near-optimal policy with high probability, little is known about their instance-dependent sample complexity. A negative result of Wagenmaker et al. (2022) suggests that optimistic sampling rules cannot be used to attain the (still elusive) optimal instance-dependent sample complexity. On the positive side, we provide the first instance-dependent bound for an optimistic algorithm for PAC RL, BPI-UCRL, for which only minimax guarantees were available (Kaufmann et al., 2021). While our bound features some minimal visitation probabilities, it also features a refined notion of sub-optimality gap compared to the value gaps that appear in prior work. Moreover, in MDPs with deterministic transitions, we show that BPI-UCRL is actually near instance-optimal (up to a factor of the horizon). On the technical side, our analysis is very simple thanks to a new "target trick" of independent interest. We complement these findings with a novel hardness result explaining why the instance-dependent complexity of PAC RL cannot be easily related to that of regret minimization, unlike in the minimax regime.Show less >
Show more >Optimistic algorithms have been extensively studied for regret minimization in episodic tabular Markov Decision Processes (MDPs), both from a minimax and an instance-dependent view. However, for the PAC RL problem, where the goal is to identify a near-optimal policy with high probability, little is known about their instance-dependent sample complexity. A negative result of Wagenmaker et al. (2022) suggests that optimistic sampling rules cannot be used to attain the (still elusive) optimal instance-dependent sample complexity. On the positive side, we provide the first instance-dependent bound for an optimistic algorithm for PAC RL, BPI-UCRL, for which only minimax guarantees were available (Kaufmann et al., 2021). While our bound features some minimal visitation probabilities, it also features a refined notion of sub-optimality gap compared to the value gaps that appear in prior work. Moreover, in MDPs with deterministic transitions, we show that BPI-UCRL is actually near instance-optimal (up to a factor of the horizon). On the technical side, our analysis is very simple thanks to a new "target trick" of independent interest. We complement these findings with a novel hardness result explaining why the instance-dependent complexity of PAC RL cannot be easily related to that of regret minimization, unlike in the minimax regime.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- document
- Open access
- Access the document
- TAMK23.pdf
- Open access
- Access the document
- 2210.00974
- Open access
- Access the document
- document
- Open access
- Access the document
- TAMK23.pdf
- Open access
- Access the document