There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

Grinsztajn, Nathan; Ferret, Johan; Pietquin, Olivier; Preux, Philippe; Geist, Matthieu

Type de document :

Communication dans un congrès avec actes

Titre :

There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

Auteur(s) :

Grinsztajn, Nathan [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
Ferret, Johan [Auteur]
Pietquin, Olivier [Auteur]
Preux, Philippe [Auteur]
Geist, Matthieu [Auteur]

Titre de la manifestation scientifique :

Neural Information Processing Systems (2021)

Ville :

Virtual

Pays :

France

Date de début de la manifestation scientifique :

2021-12-06

Titre de l’ouvrage :

Proc. Thirty-fifth Conference on Neural Information Processing Systems

Discipline(s) HAL :

Informatique [cs]/Apprentissage [cs.LG]
Informatique [cs]/Intelligence artificielle [cs.AI]

Résumé en anglais : [en]

We propose to learn to distinguish reversible from irreversible actions for better informed decision-making in Reinforcement Learning (RL). From theoretical considerations, we show that approximate reversibility can be ...
Lire la suite >We propose to learn to distinguish reversible from irreversible actions for better informed decision-making in Reinforcement Learning (RL). From theoretical considerations, we show that approximate reversibility can be learned through a simple surrogate task: ranking randomly sampled trajectory events in chronological order. Intuitively, pairs of events that are always observed in the same order are likely to be separated by an irreversible sequence of actions. Conveniently, learning the temporal order of events can be done in a fully self-supervised way, which we use to estimate the reversibility of actions from experience, without any priors. We propose two different strategies that incorporate reversibility in RL agents, one strategy for exploration (RAE) and one strategy for control (RAC). We demonstrate the potential of reversibility-aware agents in several environments, including the challenging Sokoban game. In synthetic tasks, we show that we can learn control policies that never fail and reduce to zero the side-effects of interactions, even without access to the reward function.Lire moins >

Langue :

Anglais

Comité de lecture :

Oui

Audience :

Internationale

Vulgarisation :

Non

Collections :

Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189

Source :

Harvested from HAL

Fichiers

https://hal.archives-ouvertes.fr/hal-03454640/document
Accès libre
Accéder au document

https://hal.archives-ouvertes.fr/hal-03454640/document
Accès libre
Accéder au document

https://hal.archives-ouvertes.fr/hal-03454640/document
Accès libre
Accéder au document

document
Accès libre
Accéder au document

Reversibility_Aware_Reinforcement_Learning__NeurIPS_.pdf
Accès libre
Accéder au document

There Is No Turning Back: A Self-Supervised ... BibTeX CSV Excel RIS

Fichiers

There Is No Turning Back: A Self-Supervised ...

BibTeX

CSV

Excel

RIS