"I'm sorry Dave, I'm afraid I can't do ...
Type de document :
Communication dans un congrès avec actes
Titre :
"I'm sorry Dave, I'm afraid I can't do that" Deep Q-Learning From Forbidden Actions
Auteur(s) :
Seurin, Mathieu [Auteur]
Scool [Scool]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Preux, Philippe [Auteur]
Scool [Scool]
Sequential Learning [SEQUEL]
Pietquin, Olivier [Auteur]
Google Brain, Paris
Scool [Scool]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Preux, Philippe [Auteur]
Scool [Scool]
Sequential Learning [SEQUEL]
Pietquin, Olivier [Auteur]
Google Brain, Paris
Titre de la manifestation scientifique :
Internationnal Joint Conference on Neural Networks
Ville :
Glasgow
Pays :
Royaume-Uni
Date de début de la manifestation scientifique :
2020-07-17
Mot(s)-clé(s) en anglais :
Deep Reinforcement Learning
Safety
constraints
Q-Learning
Safety
constraints
Q-Learning
Discipline(s) HAL :
Informatique [cs]/Intelligence artificielle [cs.AI]
Informatique [cs]/Apprentissage [cs.LG]
Informatique [cs]/Réseau de neurones [cs.NE]
Informatique [cs]/Apprentissage [cs.LG]
Informatique [cs]/Réseau de neurones [cs.NE]
Résumé en anglais : [en]
The use of Reinforcement Learning (RL) is still restricted to simulation or to enhance human-operated systems through recommendations. Real-world environments (e.g. industrial robots or power grids) are generally designed ...
Lire la suite >The use of Reinforcement Learning (RL) is still restricted to simulation or to enhance human-operated systems through recommendations. Real-world environments (e.g. industrial robots or power grids) are generally designed with safety constraints in mind implemented in the shape of valid actions masks or contingency controllers. For example, the range of motion and the angles of the motors of a robot can be limited to physical boundaries. Violating constraints thus results in rejected actions or entering in a safe mode driven by an external controller, making RL agents incapable of learning from their mistakes. In this paper, we propose a simple modification of a state-of-the-art deep RL algorithm (DQN), enabling learning from forbidden actions. To do so, the standard Q-learning update is enhanced with an extra safety loss inspired by structured classification. We empirically show that it reduces the number of hit constraints during the learning phase and accelerates convergence to near-optimal policies compared to using standard DQN. Experiments are done on a Visual Grid World Environment and Text-World domain.Lire moins >
Lire la suite >The use of Reinforcement Learning (RL) is still restricted to simulation or to enhance human-operated systems through recommendations. Real-world environments (e.g. industrial robots or power grids) are generally designed with safety constraints in mind implemented in the shape of valid actions masks or contingency controllers. For example, the range of motion and the angles of the motors of a robot can be limited to physical boundaries. Violating constraints thus results in rejected actions or entering in a safe mode driven by an external controller, making RL agents incapable of learning from their mistakes. In this paper, we propose a simple modification of a state-of-the-art deep RL algorithm (DQN), enabling learning from forbidden actions. To do so, the standard Q-learning update is enhanced with an extra safety loss inspired by structured classification. We empirically show that it reduces the number of hit constraints during the learning phase and accelerates convergence to near-optimal policies compared to using standard DQN. Experiments are done on a Visual Grid World Environment and Text-World domain.Lire moins >
Langue :
Anglais
Comité de lecture :
Non
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-02387419v2/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-02387419v2/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-02387419v2/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- Dave_IJCNN.pdf
- Accès libre
- Accéder au document