Augmented Bayesian Policy Search

Kallel, Mahdi; Basu, Debabrota; Akrour, Riad; d'Eramo, Carlo

Type de document :

Communication dans un congrès avec actes

Titre :

Augmented Bayesian Policy Search

Auteur(s) :

Kallel, Mahdi [Auteur]
Julius-Maximilians-Universität Würzburg = University of Würzburg [Würsburg, Germany] [JMU]
Basu, Debabrota [Auteur]
Centrale Lille
Université de Lille
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
Akrour, Riad [Auteur]
Centrale Lille
Université de Lille
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
d'Eramo, Carlo [Auteur]
Julius-Maximilians-Universität Würzburg = University of Würzburg [Würsburg, Germany] [JMU]

Titre de la manifestation scientifique :

The Twelfth International Conference on Learning Representations (ICLR)

Ville :

Vienna

Pays :

Autriche

Date de début de la manifestation scientifique :

2024-05

Discipline(s) HAL :

Informatique [cs]/Apprentissage [cs.LG]
Informatique [cs]/Intelligence artificielle [cs.AI]
Mathématiques [math]/Optimisation et contrôle [math.OC]

Résumé en anglais : [en]

Deterministic policies are often preferred over stochastic ones when implemented on physical systems. They can prevent erratic and harmful behaviors while being easier to implement and interpret. However, in practice, ...
Lire la suite >Deterministic policies are often preferred over stochastic ones when implemented on physical systems. They can prevent erratic and harmful behaviors while being easier to implement and interpret. However, in practice, exploration is largely performed by stochastic policies. First-order Bayesian optimization (BO) methods offer a principled way of performing exploration using deterministic policies. This is done through a learned probabilistic model of the objective function and its gradient. Nonetheless, such approaches treat policy search as a black-box problem, and thus, neglect the reinforcement learning nature of the problem. In this work, we leverage the performance difference lemma to introduce a novel mean function for the probabilistic model. This results in augmenting BO methods with the action-value function. Hence, we call our method Augmented Bayesian Search (ABS). Interestingly, this new mean function enhances the posterior gradient with the deterministic policy gradient, effectively bridging the gap between BO and policy gradient methods. The resulting algorithm combines the convenience of the direct policy search with the scalability of reinforcement learning. We validate ABS on high-dimensional locomotion problems and demonstrate competitive performance compared to existing direct policy search schemes.Lire moins >

Langue :

Anglais

Comité de lecture :

Oui

Audience :

Internationale

Vulgarisation :

Non

Projet ANR :

REPUBLIC: Vers l'IA responsable avec l'apprentissage par renforcement sous contraintes

Collections :

Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189

Source :

Harvested from HAL

Fichiers

document
Accès libre
Accéder au document

3378_augmented_bayesian_policy_sear.pdf
Accès libre
Accéder au document

document
Accès libre
Accéder au document

3378_augmented_bayesian_policy_sear.pdf
Accès libre
Accéder au document

Augmented Bayesian Policy Search BibTeX CSV Excel RIS

Fichiers

Augmented Bayesian Policy Search

BibTeX

CSV

Excel

RIS