Stochastic Variance-Reduced Policy Gradient
Document type :
Communication dans un congrès avec actes
Title :
Stochastic Variance-Reduced Policy Gradient
Author(s) :
Papini, Matteo [Auteur]
Department of Electronics, Information, and Bioengineering [Milano] [DEIB]
Binaghi, Damiano [Auteur]
Department of Electronics, Information, and Bioengineering [Milano] [DEIB]
Canonaco, Giuseppe [Auteur]
Department of Electronics, Information, and Bioengineering [Milano] [DEIB]
Pirotta, Matteo [Auteur]
Sequential Learning [SEQUEL]
Restelli, Marcello [Auteur]
Department of Electronics, Information, and Bioengineering [Milano] [DEIB]
Department of Electronics, Information, and Bioengineering [Milano] [DEIB]
Binaghi, Damiano [Auteur]
Department of Electronics, Information, and Bioengineering [Milano] [DEIB]
Canonaco, Giuseppe [Auteur]
Department of Electronics, Information, and Bioengineering [Milano] [DEIB]
Pirotta, Matteo [Auteur]
Sequential Learning [SEQUEL]
Restelli, Marcello [Auteur]
Department of Electronics, Information, and Bioengineering [Milano] [DEIB]
Conference title :
ICML 2018 - 35th International Conference on Machine Learning
City :
Stockholm
Country :
Suède
Start date of the conference :
2018-07-10
Journal title :
Proceedings of Machine Learning Research
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient ...
Show more >In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.Show less >
Show more >In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.inria.fr/hal-01940394/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01940394/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01940394/document
- Open access
- Access the document
- document
- Open access
- Access the document
- supplementary.pdf
- Open access
- Access the document
- document
- Open access
- Access the document
- supplementary.pdf
- Open access
- Access the document