SAAC: Safe Reinforcement Learning as an ...
Type de document :
Communication dans un congrès avec actes
Titre :
SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics
Auteur(s) :
Titre de la manifestation scientifique :
RLDM 2022 - The Multi-disciplinary Conference on Reinforcement Learning and Decision Making
Ville :
Providence
Pays :
Etats-Unis d'Amérique
Date de début de la manifestation scientifique :
2022-06-08
Mot(s)-clé(s) en anglais :
Reinforcement leaning
Safe reinforcement learning
Risk-sensitive control
Non-zero sum game
MaxEnt RL
Entropy regularization
Deep reinforcement learning
Safe reinforcement learning
Risk-sensitive control
Non-zero sum game
MaxEnt RL
Entropy regularization
Deep reinforcement learning
Discipline(s) HAL :
Informatique [cs]/Apprentissage [cs.LG]
Informatique [cs]/Intelligence artificielle [cs.AI]
Informatique [cs]/Informatique et théorie des jeux [cs.GT]
Informatique [cs]/Systèmes et contrôle [cs.SY]
Informatique [cs]/Intelligence artificielle [cs.AI]
Informatique [cs]/Informatique et théorie des jeux [cs.GT]
Informatique [cs]/Systèmes et contrôle [cs.SY]
Résumé en anglais : [en]
Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we ...
Lire la suite >Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we formulate the RL problem with safety constraints as a non-zero-sum game. While deployed with maximum entropy RL, this formulation leads to a safe adversarially guided soft actor-critic framework, called SAAC. In SAAC, the adversary aims to break the safety constraint while the RL agent aims to maximize the constrained value function given the adversary's policy. The safety constraint on the agent's value function manifests only as a repulsion term between the agent's and the adversary's policies. Unlike previous approaches, SAAC can address different safety criteria such as safe exploration, mean-variance risk sensitivity, and CVaR-like coherent risk sensitivity. We illustrate the design of the adversary for these constraints. Then, in each of these variations, we show the agent differentiates itself from the adversary's unsafe actions in addition to learning to solve the task. Finally, for challenging continuous control tasks, we demonstrate that SAAC achieves faster convergence, better efficiency, and fewer failures to satisfy the safety constraints than risk-averse distributional RL and risk-neutral soft actor-critic algorithms.Lire moins >
Lire la suite >Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we formulate the RL problem with safety constraints as a non-zero-sum game. While deployed with maximum entropy RL, this formulation leads to a safe adversarially guided soft actor-critic framework, called SAAC. In SAAC, the adversary aims to break the safety constraint while the RL agent aims to maximize the constrained value function given the adversary's policy. The safety constraint on the agent's value function manifests only as a repulsion term between the agent's and the adversary's policies. Unlike previous approaches, SAAC can address different safety criteria such as safe exploration, mean-variance risk sensitivity, and CVaR-like coherent risk sensitivity. We illustrate the design of the adversary for these constraints. Then, in each of these variations, we show the agent differentiates itself from the adversary's unsafe actions in addition to learning to solve the task. Finally, for challenging continuous control tasks, we demonstrate that SAAC achieves faster convergence, better efficiency, and fewer failures to satisfy the safety constraints than risk-averse distributional RL and risk-neutral soft actor-critic algorithms.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Commentaire :
Accepted at the 5th Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2022)
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-03771734/document
- Accès libre
- Accéder au document
- http://arxiv.org/pdf/2204.09424
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-03771734/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- saac%20arxiv.pdf
- Accès libre
- Accéder au document
- 2204.09424
- Accès libre
- Accéder au document