SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Flet-Berliac, Yannis; Basu, Debabrota

Type de document :

Communication dans un congrès avec actes

Titre :

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Auteur(s) :

Flet-Berliac, Yannis [Auteur]
Stanford University
Basu, Debabrota [Auteur]
Scool [Scool]

Titre de la manifestation scientifique :

RLDM 2022 - The Multi-disciplinary Conference on Reinforcement Learning and Decision Making

Ville :

Providence

Pays :

Etats-Unis d'Amérique

Date de début de la manifestation scientifique :

2022-06-08

Mot(s)-clé(s) en anglais :

Reinforcement leaning
Safe reinforcement learning
Risk-sensitive control
Non-zero sum game
MaxEnt RL
Entropy regularization
Deep reinforcement learning

Discipline(s) HAL :

Informatique [cs]/Apprentissage [cs.LG]
Informatique [cs]/Intelligence artificielle [cs.AI]
Informatique [cs]/Informatique et théorie des jeux [cs.GT]
Informatique [cs]/Systèmes et contrôle [cs.SY]

Résumé en anglais : [en]

Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we ...
Lire la suite >Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we formulate the RL problem with safety constraints as a non-zero-sum game. While deployed with maximum entropy RL, this formulation leads to a safe adversarially guided soft actor-critic framework, called SAAC. In SAAC, the adversary aims to break the safety constraint while the RL agent aims to maximize the constrained value function given the adversary's policy. The safety constraint on the agent's value function manifests only as a repulsion term between the agent's and the adversary's policies. Unlike previous approaches, SAAC can address different safety criteria such as safe exploration, mean-variance risk sensitivity, and CVaR-like coherent risk sensitivity. We illustrate the design of the adversary for these constraints. Then, in each of these variations, we show the agent differentiates itself from the adversary's unsafe actions in addition to learning to solve the task. Finally, for challenging continuous control tasks, we demonstrate that SAAC achieves faster convergence, better efficiency, and fewer failures to satisfy the safety constraints than risk-averse distributional RL and risk-neutral soft actor-critic algorithms.Lire moins >

Langue :

Anglais

Comité de lecture :

Oui

Audience :

Internationale

Vulgarisation :

Non

Commentaire :

Accepted at the 5th Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2022)

Collections :

Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189

Source :

Harvested from HAL

Fichiers

https://hal.archives-ouvertes.fr/hal-03771734/document
Accès libre
Accéder au document

http://arxiv.org/pdf/2204.09424
Accès libre
Accéder au document

https://hal.archives-ouvertes.fr/hal-03771734/document
Accès libre
Accéder au document

document
Accès libre
Accéder au document

saac%20arxiv.pdf
Accès libre
Accéder au document

2204.09424
Accès libre
Accéder au document

SAAC: Safe Reinforcement Learning as an ... BibTeX CSV Excel RIS

Fichiers

SAAC: Safe Reinforcement Learning as an ...

BibTeX

CSV

Excel

RIS