Compatible Reward Inverse Reinforcement Learning
Document type :
Communication dans un congrès avec actes
Title :
Compatible Reward Inverse Reinforcement Learning
Author(s) :
Metelli, Alberto [Auteur]
Department of Electronics, Information, and Bioengineering [Milano] [DEIB]
Pirotta, Matteo [Auteur]
Sequential Learning [SEQUEL]
Restelli, Marcello [Auteur]
Department of Electronics, Information, and Bioengineering [Milano] [DEIB]
Department of Electronics, Information, and Bioengineering [Milano] [DEIB]
Pirotta, Matteo [Auteur]
Sequential Learning [SEQUEL]
Restelli, Marcello [Auteur]
Department of Electronics, Information, and Bioengineering [Milano] [DEIB]
Conference title :
The Thirty-first Annual Conference on Neural Information Processing Systems - NIPS 2017
City :
Long Beach
Country :
Etats-Unis d'Amérique
Start date of the conference :
2017-12-04
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
Inverse Reinforcement Learning (IRL) is an effective approach to recover a reward function that explains the behavior of an expert by observing a set of demonstrations. This paper is about a novel model-free IRL approach ...
Show more >Inverse Reinforcement Learning (IRL) is an effective approach to recover a reward function that explains the behavior of an expert by observing a set of demonstrations. This paper is about a novel model-free IRL approach that, differently from most of the existing IRL algorithms, does not require to specify a function space where to search for the expert's reward function. Leveraging on the fact that the policy gradient needs to be zero for any optimal policy, the algorithm generates a set of basis functions that span the subspace of reward functions that make the policy gradient vanish. Within this subspace, using a second-order criterion, we search for the reward function that penalizes the most a deviation from the expert's policy. After introducing our approach for finite domains, we extend it to continuous ones. The proposed approach is empirically compared to other IRL methods both in the (finite) Taxi domain and in the (continuous) Linear Quadratic Gaussian (LQG) and Car on the Hill environments.Show less >
Show more >Inverse Reinforcement Learning (IRL) is an effective approach to recover a reward function that explains the behavior of an expert by observing a set of demonstrations. This paper is about a novel model-free IRL approach that, differently from most of the existing IRL algorithms, does not require to specify a function space where to search for the expert's reward function. Leveraging on the fact that the policy gradient needs to be zero for any optimal policy, the algorithm generates a set of basis functions that span the subspace of reward functions that make the policy gradient vanish. Within this subspace, using a second-order criterion, we search for the reward function that penalizes the most a deviation from the expert's policy. After introducing our approach for finite domains, we extend it to continuous ones. The proposed approach is empirically compared to other IRL methods both in the (finite) Taxi domain and in the (continuous) Linear Quadratic Gaussian (LQG) and Car on the Hill environments.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.inria.fr/hal-01653328/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01653328/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01653328/document
- Open access
- Access the document
- document
- Open access
- Access the document
- 6800-compatible-reward-inverse-reinforcement-learning.pdf
- Open access
- Access the document
- document
- Open access
- Access the document
- 6800-compatible-reward-inverse-reinforcement-learning.pdf
- Open access
- Access the document