A Fitted-Q Algorithm for Budgeted MDPs
Document type :
Partie d'ouvrage
Title :
A Fitted-Q Algorithm for Budgeted MDPs
Author(s) :
Carrara, Nicolas [Auteur]
Sequential Learning [SEQUEL]
Orange Labs [Lannion]
Laroche, Romain [Auteur]
Maluuba
Bouraoui, Jean-Léon [Auteur]
Orange Labs [Lannion]
Urvoy, Tanguy [Auteur]
Orange Labs [Lannion]
Pietquin, Olivier [Auteur]
Sequential Learning [SEQUEL]
SUPELEC-Campus Metz
Sequential Learning [SEQUEL]
Orange Labs [Lannion]
Laroche, Romain [Auteur]
Maluuba
Bouraoui, Jean-Léon [Auteur]
Orange Labs [Lannion]
Urvoy, Tanguy [Auteur]
Orange Labs [Lannion]
Pietquin, Olivier [Auteur]
Sequential Learning [SEQUEL]
SUPELEC-Campus Metz
Publication date :
2018-08
HAL domain(s) :
Informatique [cs]/Intelligence artificielle [cs.AI]
English abstract : [en]
We address the problem of bud-geted/constrained reinforcement learning in continuous state-space using a batch of transitions. For this purpose, we introduce a novel algorithm called Budgeted Fitted-Q (BFTQ). We carry out ...
Show more >We address the problem of bud-geted/constrained reinforcement learning in continuous state-space using a batch of transitions. For this purpose, we introduce a novel algorithm called Budgeted Fitted-Q (BFTQ). We carry out some preliminary benchmarks on a continuous 2-D world. They show that BFTQ performs as well as a penalized Fitted-Q algorithm while also allowing ones to adapt the trained policy on-the-fly for a given amount of budget and without the need of engineering the reward penalties. We believe that the general principles used to design BFTQ could be used to extend others classical reinforcement learning algorithms to budget-oriented applications.Show less >
Show more >We address the problem of bud-geted/constrained reinforcement learning in continuous state-space using a batch of transitions. For this purpose, we introduce a novel algorithm called Budgeted Fitted-Q (BFTQ). We carry out some preliminary benchmarks on a continuous 2-D world. They show that BFTQ performs as well as a penalized Fitted-Q algorithm while also allowing ones to adapt the trained policy on-the-fly for a given amount of budget and without the need of engineering the reward penalties. We believe that the general principles used to design BFTQ could be used to extend others classical reinforcement learning algorithms to budget-oriented applications.Show less >
Language :
Anglais
Popular science :
Non
Collections :
Source :
Files
- https://hal.archives-ouvertes.fr/hal-01867353/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-01867353/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-01867353/document
- Open access
- Access the document
- document
- Open access
- Access the document
- ncarrara-saferl-uai-2018.pdf
- Open access
- Access the document
- document
- Open access
- Access the document
- ncarrara-saferl-uai-2018.pdf
- Open access
- Access the document