Direct Policy Iteration with Demonstrations
Type de document :
Communication dans un congrès avec actes
Titre :
Direct Policy Iteration with Demonstrations
Auteur(s) :
Chemali, Jessica [Auteur]
Computer Science Department - Carnegie Mellon University
Lazaric, Alessandro [Auteur]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Computer Science Department - Carnegie Mellon University
Lazaric, Alessandro [Auteur]

Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Titre de la manifestation scientifique :
IJCAI - 24th International Joint Conference on Artificial Intelligence
Ville :
Buenos Aires
Pays :
Argentine
Date de début de la manifestation scientifique :
2015-07-25
Date de publication :
2015-07
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
We consider the problem of learning the optimal policy of an unknown Markov decision process (MDP) when expert demonstrations are available along with interaction samples. We build on classification-based policy iteration ...
Lire la suite >We consider the problem of learning the optimal policy of an unknown Markov decision process (MDP) when expert demonstrations are available along with interaction samples. We build on classification-based policy iteration to perform a seamless integration of interaction and expert data, thus obtaining an algorithm which can benefit from both sources of information at the same time. Furthermore , we provide a full theoretical analysis of the performance across iterations providing insights on how the algorithm works. Finally, we report an empirical evaluation of the algorithm and a comparison with the state-of-the-art algorithms.Lire moins >
Lire la suite >We consider the problem of learning the optimal policy of an unknown Markov decision process (MDP) when expert demonstrations are available along with interaction samples. We build on classification-based policy iteration to perform a seamless integration of interaction and expert data, thus obtaining an algorithm which can benefit from both sources of information at the same time. Furthermore , we provide a full theoretical analysis of the performance across iterations providing insights on how the algorithm works. Finally, we report an empirical evaluation of the algorithm and a comparison with the state-of-the-art algorithms.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-01237659/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01237659/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01237659/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- DPID_CameraReady.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- DPID_CameraReady.pdf
- Accès libre
- Accéder au document