Budgeted Reinforcement Learning in Continuous ...
Document type :
Communication dans un congrès avec actes
Title :
Budgeted Reinforcement Learning in Continuous State Space
Author(s) :
Carrara, Nicolas [Auteur]
Sequential Learning [SEQUEL]
Leurent, Edouard [Auteur]
RENAULT
Sequential Learning [SEQUEL]
Laroche, Romain [Auteur]
Urvoy, Tanguy [Auteur]
Orange Labs [Lannion]
Maillard, Odalric Ambrym [Auteur]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Pietquin, Olivier [Auteur]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Sequential Learning [SEQUEL]
Leurent, Edouard [Auteur]
RENAULT
Sequential Learning [SEQUEL]
Laroche, Romain [Auteur]
Urvoy, Tanguy [Auteur]
Orange Labs [Lannion]
Maillard, Odalric Ambrym [Auteur]

Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Pietquin, Olivier [Auteur]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Conference title :
Conference on Neural Information Processing Systems
City :
Vancouver
Country :
Canada
Start date of the conference :
2019-12
Journal title :
Advances in Neural Information Processing Systems
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained ...
Show more >A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an-adjustable-threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.Show less >
Show more >A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an-adjustable-threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.archives-ouvertes.fr/hal-02375727/document
- Open access
- Access the document
- http://arxiv.org/pdf/1903.01004
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-02375727/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-02375727/document
- Open access
- Access the document
- document
- Open access
- Access the document
- 9128-budgeted-reinforcement-learning-in-continuous-state-space.pdf
- Open access
- Access the document
- 1903.01004
- Open access
- Access the document
- document
- Open access
- Access the document
- 9128-budgeted-reinforcement-learning-in-continuous-state-space.pdf
- Open access
- Access the document