Scale-free adaptive planning for deterministic ...
Type de document :
Communication dans un congrès avec actes
Titre :
Scale-free adaptive planning for deterministic dynamics & discounted rewards
Auteur(s) :
Bartlett, Peter [Auteur]
Queensland University of Technology [Brisbane] [QUT]
Gabillon, Victor [Auteur]
Huawei Noah's Ark Lab [China]
Healey, Jennifer [Auteur]
Adobe Research
Valko, Michal [Auteur]
Sequential Learning [SEQUEL]
Queensland University of Technology [Brisbane] [QUT]
Gabillon, Victor [Auteur]
Huawei Noah's Ark Lab [China]
Healey, Jennifer [Auteur]
Adobe Research
Valko, Michal [Auteur]

Sequential Learning [SEQUEL]
Titre de la manifestation scientifique :
International Conference on Machine Learning
Ville :
Long Beach
Pays :
Etats-Unis d'Amérique
Date de début de la manifestation scientifique :
2019
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
We address the problem of planning in an environment with deterministic dynamics and stochas-tic discounted rewards under a limited numerical budget where the ranges of both rewards and noise are unknown. We introduce ...
Lire la suite >We address the problem of planning in an environment with deterministic dynamics and stochas-tic discounted rewards under a limited numerical budget where the ranges of both rewards and noise are unknown. We introduce PlaTγPOOS, an adaptive, robust, and efficient alternative to the OLOP (open-loop optimistic planning) algorithm. Whereas OLOP requires a priori knowledge of the ranges of both rewards and noise, PlaTγPOOS dynamically adapts its behavior to both. This allows PlaTγPOOS to be immune to two vulnerabil-ities of OLOP: failure when given underestimated ranges of noise and rewards and inefficiency when these are overestimated. PlaTγPOOS additionally adapts to the global smoothness of the value function. PlaTγPOOS acts in a provably more efficient manner vs. OLOP when OLOP is given an overestimated reward and show that in the case of no noise, PlaTγPOOS learns exponentially faster.Lire moins >
Lire la suite >We address the problem of planning in an environment with deterministic dynamics and stochas-tic discounted rewards under a limited numerical budget where the ranges of both rewards and noise are unknown. We introduce PlaTγPOOS, an adaptive, robust, and efficient alternative to the OLOP (open-loop optimistic planning) algorithm. Whereas OLOP requires a priori knowledge of the ranges of both rewards and noise, PlaTγPOOS dynamically adapts its behavior to both. This allows PlaTγPOOS to be immune to two vulnerabil-ities of OLOP: failure when given underestimated ranges of noise and rewards and inefficiency when these are overestimated. PlaTγPOOS additionally adapts to the global smoothness of the value function. PlaTγPOOS acts in a provably more efficient manner vs. OLOP when OLOP is given an overestimated reward and show that in the case of no noise, PlaTγPOOS learns exponentially faster.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-02387484/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-02387484/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-02387484/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-02387484/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- icml2019platypoos.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- icml2019platypoos.pdf
- Accès libre
- Accéder au document