Scale-free adaptive planning for deterministic ...
Document type :
Communication dans un congrès avec actes
Title :
Scale-free adaptive planning for deterministic dynamics & discounted rewards
Author(s) :
Bartlett, Peter [Auteur]
Queensland University of Technology [Brisbane] [QUT]
Gabillon, Victor [Auteur]
Huawei Noah's Ark Lab [China]
Healey, Jennifer [Auteur]
Adobe Research
Valko, Michal [Auteur]
Sequential Learning [SEQUEL]
Queensland University of Technology [Brisbane] [QUT]
Gabillon, Victor [Auteur]
Huawei Noah's Ark Lab [China]
Healey, Jennifer [Auteur]
Adobe Research
Valko, Michal [Auteur]

Sequential Learning [SEQUEL]
Conference title :
International Conference on Machine Learning
City :
Long Beach
Country :
Etats-Unis d'Amérique
Start date of the conference :
2019
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
We address the problem of planning in an environment with deterministic dynamics and stochas-tic discounted rewards under a limited numerical budget where the ranges of both rewards and noise are unknown. We introduce ...
Show more >We address the problem of planning in an environment with deterministic dynamics and stochas-tic discounted rewards under a limited numerical budget where the ranges of both rewards and noise are unknown. We introduce PlaTγPOOS, an adaptive, robust, and efficient alternative to the OLOP (open-loop optimistic planning) algorithm. Whereas OLOP requires a priori knowledge of the ranges of both rewards and noise, PlaTγPOOS dynamically adapts its behavior to both. This allows PlaTγPOOS to be immune to two vulnerabil-ities of OLOP: failure when given underestimated ranges of noise and rewards and inefficiency when these are overestimated. PlaTγPOOS additionally adapts to the global smoothness of the value function. PlaTγPOOS acts in a provably more efficient manner vs. OLOP when OLOP is given an overestimated reward and show that in the case of no noise, PlaTγPOOS learns exponentially faster.Show less >
Show more >We address the problem of planning in an environment with deterministic dynamics and stochas-tic discounted rewards under a limited numerical budget where the ranges of both rewards and noise are unknown. We introduce PlaTγPOOS, an adaptive, robust, and efficient alternative to the OLOP (open-loop optimistic planning) algorithm. Whereas OLOP requires a priori knowledge of the ranges of both rewards and noise, PlaTγPOOS dynamically adapts its behavior to both. This allows PlaTγPOOS to be immune to two vulnerabil-ities of OLOP: failure when given underestimated ranges of noise and rewards and inefficiency when these are overestimated. PlaTγPOOS additionally adapts to the global smoothness of the value function. PlaTγPOOS acts in a provably more efficient manner vs. OLOP when OLOP is given an overestimated reward and show that in the case of no noise, PlaTγPOOS learns exponentially faster.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.inria.fr/hal-02387484/document
- Open access
- Access the document
- https://hal.inria.fr/hal-02387484/document
- Open access
- Access the document
- https://hal.inria.fr/hal-02387484/document
- Open access
- Access the document
- https://hal.inria.fr/hal-02387484/document
- Open access
- Access the document
- document
- Open access
- Access the document
- icml2019platypoos.pdf
- Open access
- Access the document
- document
- Open access
- Access the document
- icml2019platypoos.pdf
- Open access
- Access the document