SAAC: Safe Reinforcement Learning as an ...
Document type :
Communication dans un congrès avec actes
Title :
SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics
Author(s) :
Conference title :
RLDM 2022 - The Multi-disciplinary Conference on Reinforcement Learning and Decision Making
City :
Providence
Country :
Etats-Unis d'Amérique
Start date of the conference :
2022-06-08
English keyword(s) :
Reinforcement leaning
Safe reinforcement learning
Risk-sensitive control
Non-zero sum game
MaxEnt RL
Entropy regularization
Deep reinforcement learning
Safe reinforcement learning
Risk-sensitive control
Non-zero sum game
MaxEnt RL
Entropy regularization
Deep reinforcement learning
HAL domain(s) :
Informatique [cs]/Apprentissage [cs.LG]
Informatique [cs]/Intelligence artificielle [cs.AI]
Informatique [cs]/Informatique et théorie des jeux [cs.GT]
Informatique [cs]/Systèmes et contrôle [cs.SY]
Informatique [cs]/Intelligence artificielle [cs.AI]
Informatique [cs]/Informatique et théorie des jeux [cs.GT]
Informatique [cs]/Systèmes et contrôle [cs.SY]
English abstract : [en]
Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we ...
Show more >Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we formulate the RL problem with safety constraints as a non-zero-sum game. While deployed with maximum entropy RL, this formulation leads to a safe adversarially guided soft actor-critic framework, called SAAC. In SAAC, the adversary aims to break the safety constraint while the RL agent aims to maximize the constrained value function given the adversary's policy. The safety constraint on the agent's value function manifests only as a repulsion term between the agent's and the adversary's policies. Unlike previous approaches, SAAC can address different safety criteria such as safe exploration, mean-variance risk sensitivity, and CVaR-like coherent risk sensitivity. We illustrate the design of the adversary for these constraints. Then, in each of these variations, we show the agent differentiates itself from the adversary's unsafe actions in addition to learning to solve the task. Finally, for challenging continuous control tasks, we demonstrate that SAAC achieves faster convergence, better efficiency, and fewer failures to satisfy the safety constraints than risk-averse distributional RL and risk-neutral soft actor-critic algorithms.Show less >
Show more >Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we formulate the RL problem with safety constraints as a non-zero-sum game. While deployed with maximum entropy RL, this formulation leads to a safe adversarially guided soft actor-critic framework, called SAAC. In SAAC, the adversary aims to break the safety constraint while the RL agent aims to maximize the constrained value function given the adversary's policy. The safety constraint on the agent's value function manifests only as a repulsion term between the agent's and the adversary's policies. Unlike previous approaches, SAAC can address different safety criteria such as safe exploration, mean-variance risk sensitivity, and CVaR-like coherent risk sensitivity. We illustrate the design of the adversary for these constraints. Then, in each of these variations, we show the agent differentiates itself from the adversary's unsafe actions in addition to learning to solve the task. Finally, for challenging continuous control tasks, we demonstrate that SAAC achieves faster convergence, better efficiency, and fewer failures to satisfy the safety constraints than risk-averse distributional RL and risk-neutral soft actor-critic algorithms.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Comment :
Accepted at the 5th Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2022)
Collections :
Source :
Files
- https://hal.archives-ouvertes.fr/hal-03771734/document
- Open access
- Access the document
- http://arxiv.org/pdf/2204.09424
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-03771734/document
- Open access
- Access the document
- document
- Open access
- Access the document
- saac%20arxiv.pdf
- Open access
- Access the document
- 2204.09424
- Open access
- Access the document