Analysis of Classification-based Policy ...
Type de document :
Article dans une revue scientifique: Article original
Titre :
Analysis of Classification-based Policy Iteration Algorithms
Auteur(s) :
Lazaric, Alessandro [Auteur]
Sequential Learning [SEQUEL]
Ghavamzadeh, Mohammad [Auteur]
Sequential Learning [SEQUEL]
Adobe Systems Inc. [Adobe Advanced Technology Labs]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
DeepMind [London]

Sequential Learning [SEQUEL]
Ghavamzadeh, Mohammad [Auteur]
Sequential Learning [SEQUEL]
Adobe Systems Inc. [Adobe Advanced Technology Labs]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
DeepMind [London]
Titre de la revue :
Journal of Machine Learning Research
Pagination :
1 - 30
Éditeur :
Microtome Publishing
Date de publication :
2016
ISSN :
1532-4435
Mot(s)-clé(s) en anglais :
reinforcement learning
policy iteration
classification-based approach to policy iteration
finite-sample analysis
policy iteration
classification-based approach to policy iteration
finite-sample analysis
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
We introduce a variant of the classification-based approach to policy iteration which uses a cost-sensitive loss function weighting each classification mistake by its actual regret, that is, the difference between the ...
Lire la suite >We introduce a variant of the classification-based approach to policy iteration which uses a cost-sensitive loss function weighting each classification mistake by its actual regret, that is, the difference between the action-value of the greedy action and of the action chosen by the classifier. For this algorithm, we provide a full finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space (classifier), and a capacity measure which indicates how well the policy space can approximate policies that are greedy with respect to any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. Furthermore it confirms the intuition that classification-based policy iteration algorithms could be favorably compared to value-based approaches when the policies can be approximated more easily than their corresponding value functions. We also study the consistency of the algorithm when there exists a sequence of policy spaces with increasing capacity.Lire moins >
Lire la suite >We introduce a variant of the classification-based approach to policy iteration which uses a cost-sensitive loss function weighting each classification mistake by its actual regret, that is, the difference between the action-value of the greedy action and of the action chosen by the classifier. For this algorithm, we provide a full finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space (classifier), and a capacity measure which indicates how well the policy space can approximate policies that are greedy with respect to any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. Furthermore it confirms the intuition that classification-based policy iteration algorithms could be favorably compared to value-based approaches when the policies can be approximated more easily than their corresponding value functions. We also study the consistency of the algorithm when there exists a sequence of policy spaces with increasing capacity.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-01401513/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01401513/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01401513/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- 10-364.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- 10-364.pdf
- Accès libre
- Accéder au document