Score-based Inverse Reinforcement Learning
Type de document :
Communication dans un congrès avec actes
Titre :
Score-based Inverse Reinforcement Learning
Auteur(s) :
El Asri, Layla [Auteur]
Georgia Tech Lorraine [Metz]
Orange Labs [Issy les Moulineaux]
Piot, Bilal [Auteur]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences et Technologies
Geist, Matthieu [Auteur]
MAchine Learning and Interactive Systems [MALIS]
Laroche, Romain [Auteur]
Orange Labs [Issy les Moulineaux]
Pietquin, Olivier [Auteur]
Institut universitaire de France [IUF]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences et Technologies
Georgia Tech Lorraine [Metz]
Orange Labs [Issy les Moulineaux]
Piot, Bilal [Auteur]

Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences et Technologies
Geist, Matthieu [Auteur]
MAchine Learning and Interactive Systems [MALIS]
Laroche, Romain [Auteur]
Orange Labs [Issy les Moulineaux]
Pietquin, Olivier [Auteur]
Institut universitaire de France [IUF]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences et Technologies
Titre de la manifestation scientifique :
International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016)
Ville :
Singapore
Pays :
Singapour
Date de début de la manifestation scientifique :
2016-05-09
Mot(s)-clé(s) en anglais :
Reinforcement Learning
Inverse Reinforcement Learning
Markov Decision Processes
Learning from Demonstration
Spoken Dialogue Systems
Inverse Reinforcement Learning
Markov Decision Processes
Learning from Demonstration
Spoken Dialogue Systems
Discipline(s) HAL :
Informatique [cs]/Apprentissage [cs.LG]
Informatique [cs]/Interface homme-machine [cs.HC]
Informatique [cs]/Interface homme-machine [cs.HC]
Résumé en anglais : [en]
This paper reports theoretical and empirical results obtained for the score-based Inverse Reinforcement Learning (IRL) algorithm. It relies on a non-standard setting for IRL consisting of learning a reward from a set of ...
Lire la suite >This paper reports theoretical and empirical results obtained for the score-based Inverse Reinforcement Learning (IRL) algorithm. It relies on a non-standard setting for IRL consisting of learning a reward from a set of globally scored trajec-tories. This allows using any type of policy (optimal or not) to generate trajectories without prior knowledge during data collection. This way, any existing database (like logs of systems in use) can be scored a posteriori by an expert and used to learn a reward function. Thanks to this reward function, it is shown that a near-optimal policy can be computed. Being related to least-square regression, the algorithm (called SBIRL) comes with theoretical guarantees that are proven in this paper. SBIRL is compared to standard IRL algorithms on synthetic data showing that annotations do help under conditions on the quality of the trajectories. It is also shown to be suitable for real-world applications such as the optimisation of a spoken dialogue system.Lire moins >
Lire la suite >This paper reports theoretical and empirical results obtained for the score-based Inverse Reinforcement Learning (IRL) algorithm. It relies on a non-standard setting for IRL consisting of learning a reward from a set of globally scored trajec-tories. This allows using any type of policy (optimal or not) to generate trajectories without prior knowledge during data collection. This way, any existing database (like logs of systems in use) can be scored a posteriori by an expert and used to learn a reward function. Thanks to this reward function, it is shown that a near-optimal policy can be computed. Being related to least-square regression, the algorithm (called SBIRL) comes with theoretical guarantees that are proven in this paper. SBIRL is compared to standard IRL algorithms on synthetic data showing that annotations do help under conditions on the quality of the trajectories. It is also shown to be suitable for real-world applications such as the optimisation of a spoken dialogue system.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-01406886/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01406886/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01406886/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- aamas-score-based.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- aamas-score-based.pdf
- Accès libre
- Accéder au document