Adaptive Batch Size for Safe Policy Gradients

Papini, Matteo; Pirotta, Matteo; Restelli, Marcello

Document type :

Communication dans un congrès avec actes

Title :

Adaptive Batch Size for Safe Policy Gradients

Author(s) :

Papini, Matteo [Auteur]
Department of Electronics, Information, and Bioengineering [Milano] [DEIB]
Pirotta, Matteo [Auteur]
Sequential Learning [SEQUEL]
Restelli, Marcello [Auteur]
Department of Electronics, Information, and Bioengineering [Milano] [DEIB]

Conference title :

The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS)

City :

Long Beach

Country :

Etats-Unis d'Amérique

Start date of the conference :

2017-12-04

HAL domain(s) :

Statistiques [stat]/Machine Learning [stat.ML]

English abstract : [en]

Policy gradient methods are among the best Reinforcement Learning (RL) techniques to solve complex control problems. In real-world RL applications, it is common to have a good initial policy whose performance needs to be ...
Show more >Policy gradient methods are among the best Reinforcement Learning (RL) techniques to solve complex control problems. In real-world RL applications, it is common to have a good initial policy whose performance needs to be improved and it may not be acceptable to try bad policies during the learning process. Although several methods for choosing the step size exist, research paid less attention to determine the batch size, that is the number of samples used to estimate the gradient direction for each update of the policy parameters. In this paper, we propose a set of methods to jointly optimize the step and the batch sizes that guarantee (with high probability) to improve the policy performance after each update. Besides providing theoretical guarantees, we show numerical simulations to analyse the behaviour of our methods.Show less >

Language :

Anglais

Peer reviewed article :

Oui

Audience :

Internationale

Popular science :

Non

Collections :

Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189

Source :

Harvested from HAL

Files

https://hal.inria.fr/hal-01653330/document
Open access
Access the document

https://hal.inria.fr/hal-01653330/document
Open access
Access the document

https://hal.inria.fr/hal-01653330/document
Open access
Access the document

document
Open access
Access the document

6950-adaptive-batch-size-for-safe-policy-gradients.pdf
Open access
Access the document

Adaptive Batch Size for Safe Policy Gradients BibTeX CSV Excel RIS

Files

Adaptive Batch Size for Safe Policy Gradients

BibTeX

CSV

Excel

RIS