Minimax PAC bounds on the sample complexity ...
Document type :
Compte-rendu et recension critique d'ouvrage
Title :
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model
Author(s) :
Azar, Mohammad Gheshlaghi [Auteur]
Department of Medical Physics and Biophysics [Biophysics]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
Kappen, Hilbert [Auteur]
Department of Medical Physics and Biophysics [Biophysics]
Department of Medical Physics and Biophysics [Biophysics]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
Kappen, Hilbert [Auteur]
Department of Medical Physics and Biophysics [Biophysics]
Journal title :
Machine Learning
Pages :
325-349
Publisher :
Springer Verlag
Publication date :
2013
ISSN :
0885-6125
HAL domain(s) :
Informatique [cs]/Apprentissage [cs.LG]
English abstract : [en]
We consider the problem of learning the optimal action-value function in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement ...
Show more >We consider the problem of learning the optimal action-value function in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative model of the MDP: value iteration and policy iteration. The first result indicates that for an MDP with $N$ state-action pairs and the discount factor γin[0, 1) only $O(N log(N/δ)/ [(1 - γ)3 \epsilon^2])$ state-transition samples are required to find an $\epsilon$-optimal estimation of the action-value function with the probability (w.p.) 1-δ. Further, we prove that, for small values of $\epsilon$, an order of $O(N log(N/δ)/ [(1 - γ)3 \epsilon^2])$ samples is required to find an $\epsilon$ -optimal policy w.p. 1-δ. We also prove a matching lower bound of $\Omega(N log(N/δ)/ [(1 - γ)3\epsilon2])$ on the sample complexity of estimating the optimal action-value function. To the best of our knowledge, this is the first minimax result on the sample complexity of RL: The upper bound matches the lower bound interms of $N$ , $\epsilon$, δ and 1/(1 -γ) up to a constant factor. Also, both our lower bound and upper bound improve on the state-of-the-art in terms of their dependence on 1/(1-γ).Show less >
Show more >We consider the problem of learning the optimal action-value function in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative model of the MDP: value iteration and policy iteration. The first result indicates that for an MDP with $N$ state-action pairs and the discount factor γin[0, 1) only $O(N log(N/δ)/ [(1 - γ)3 \epsilon^2])$ state-transition samples are required to find an $\epsilon$-optimal estimation of the action-value function with the probability (w.p.) 1-δ. Further, we prove that, for small values of $\epsilon$, an order of $O(N log(N/δ)/ [(1 - γ)3 \epsilon^2])$ samples is required to find an $\epsilon$ -optimal policy w.p. 1-δ. We also prove a matching lower bound of $\Omega(N log(N/δ)/ [(1 - γ)3\epsilon2])$ on the sample complexity of estimating the optimal action-value function. To the best of our knowledge, this is the first minimax result on the sample complexity of RL: The upper bound matches the lower bound interms of $N$ , $\epsilon$, δ and 1/(1 -γ) up to a constant factor. Also, both our lower bound and upper bound improve on the state-of-the-art in terms of their dependence on 1/(1-γ).Show less >
Language :
Anglais
Popular science :
Non
Collections :
Source :
Files
- https://hal.archives-ouvertes.fr/hal-00831875/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-00831875/document
- Open access
- Access the document
- https://link.springer.com/content/pdf/10.1007%2Fs10994-013-5368-1.pdf
- Open access
- Access the document
- document
- Open access
- Access the document
- SampCompRL_MLJ2012.pdf
- Open access
- Access the document
- document
- Open access
- Access the document
- SampCompRL_MLJ2012.pdf
- Open access
- Access the document