Analysis of Classification-based Policy ...
Document type :
Article dans une revue scientifique: Article original
Title :
Analysis of Classification-based Policy Iteration Algorithms
Author(s) :
Lazaric, Alessandro [Auteur]
Sequential Learning [SEQUEL]
Ghavamzadeh, Mohammad [Auteur]
Sequential Learning [SEQUEL]
Adobe Systems Inc. [Adobe Advanced Technology Labs]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
DeepMind [London]

Sequential Learning [SEQUEL]
Ghavamzadeh, Mohammad [Auteur]
Sequential Learning [SEQUEL]
Adobe Systems Inc. [Adobe Advanced Technology Labs]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
DeepMind [London]
Journal title :
Journal of Machine Learning Research
Pages :
1 - 30
Publisher :
Microtome Publishing
Publication date :
2016
ISSN :
1532-4435
English keyword(s) :
reinforcement learning
policy iteration
classification-based approach to policy iteration
finite-sample analysis
policy iteration
classification-based approach to policy iteration
finite-sample analysis
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
We introduce a variant of the classification-based approach to policy iteration which uses a cost-sensitive loss function weighting each classification mistake by its actual regret, that is, the difference between the ...
Show more >We introduce a variant of the classification-based approach to policy iteration which uses a cost-sensitive loss function weighting each classification mistake by its actual regret, that is, the difference between the action-value of the greedy action and of the action chosen by the classifier. For this algorithm, we provide a full finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space (classifier), and a capacity measure which indicates how well the policy space can approximate policies that are greedy with respect to any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. Furthermore it confirms the intuition that classification-based policy iteration algorithms could be favorably compared to value-based approaches when the policies can be approximated more easily than their corresponding value functions. We also study the consistency of the algorithm when there exists a sequence of policy spaces with increasing capacity.Show less >
Show more >We introduce a variant of the classification-based approach to policy iteration which uses a cost-sensitive loss function weighting each classification mistake by its actual regret, that is, the difference between the action-value of the greedy action and of the action chosen by the classifier. For this algorithm, we provide a full finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space (classifier), and a capacity measure which indicates how well the policy space can approximate policies that are greedy with respect to any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. Furthermore it confirms the intuition that classification-based policy iteration algorithms could be favorably compared to value-based approaches when the policies can be approximated more easily than their corresponding value functions. We also study the consistency of the algorithm when there exists a sequence of policy spaces with increasing capacity.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.inria.fr/hal-01401513/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01401513/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01401513/document
- Open access
- Access the document
- document
- Open access
- Access the document
- 10-364.pdf
- Open access
- Access the document
- document
- Open access
- Access the document
- 10-364.pdf
- Open access
- Access the document