A Fitted-Q Algorithm for Budgeted MDPs
Type de document :
Partie d'ouvrage
Titre :
A Fitted-Q Algorithm for Budgeted MDPs
Auteur(s) :
Carrara, Nicolas [Auteur]
Orange Labs [Lannion]
Sequential Learning [SEQUEL]
Laroche, Romain [Auteur]
Maluuba
Bouraoui, Jean-Léon [Auteur]
Orange Labs [Lannion]
Urvoy, Tanguy [Auteur]
Orange Labs [Lannion]
Pietquin, Olivier [Auteur]
SUPELEC-Campus Metz
Sequential Learning [SEQUEL]
Orange Labs [Lannion]
Sequential Learning [SEQUEL]
Laroche, Romain [Auteur]
Maluuba
Bouraoui, Jean-Léon [Auteur]
Orange Labs [Lannion]
Urvoy, Tanguy [Auteur]
Orange Labs [Lannion]
Pietquin, Olivier [Auteur]
SUPELEC-Campus Metz
Sequential Learning [SEQUEL]
Date de publication :
2018-08
Discipline(s) HAL :
Informatique [cs]/Intelligence artificielle [cs.AI]
Résumé en anglais : [en]
We address the problem of bud-geted/constrained reinforcement learning in continuous state-space using a batch of transitions. For this purpose, we introduce a novel algorithm called Budgeted Fitted-Q (BFTQ). We carry out ...
Lire la suite >We address the problem of bud-geted/constrained reinforcement learning in continuous state-space using a batch of transitions. For this purpose, we introduce a novel algorithm called Budgeted Fitted-Q (BFTQ). We carry out some preliminary benchmarks on a continuous 2-D world. They show that BFTQ performs as well as a penalized Fitted-Q algorithm while also allowing ones to adapt the trained policy on-the-fly for a given amount of budget and without the need of engineering the reward penalties. We believe that the general principles used to design BFTQ could be used to extend others classical reinforcement learning algorithms to budget-oriented applications.Lire moins >
Lire la suite >We address the problem of bud-geted/constrained reinforcement learning in continuous state-space using a batch of transitions. For this purpose, we introduce a novel algorithm called Budgeted Fitted-Q (BFTQ). We carry out some preliminary benchmarks on a continuous 2-D world. They show that BFTQ performs as well as a penalized Fitted-Q algorithm while also allowing ones to adapt the trained policy on-the-fly for a given amount of budget and without the need of engineering the reward penalties. We believe that the general principles used to design BFTQ could be used to extend others classical reinforcement learning algorithms to budget-oriented applications.Lire moins >
Langue :
Anglais
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-01867353/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-01867353/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-01867353/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- ncarrara-saferl-uai-2018.pdf
- Accès libre
- Accéder au document