• English
    • français
  • Help
  •  | 
  • Contact
  •  | 
  • About
  •  | 
  • Login
  • HAL portal
  •  | 
  • Pages Pro
  • EN
  •  / 
  • FR
View Item 
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Analysis of Classification-based Policy ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Document type :
Article dans une revue scientifique: Article original
Title :
Analysis of Classification-based Policy Iteration Algorithms
Author(s) :
Lazaric, Alessandro [Auteur] refId
Sequential Learning [SEQUEL]
Ghavamzadeh, Mohammad [Auteur]
Sequential Learning [SEQUEL]
Adobe Systems Inc. [Adobe Advanced Technology Labs]
Munos, Rémi [Auteur]
Sequential Learning [SEQUEL]
DeepMind [London]
Journal title :
Journal of Machine Learning Research
Pages :
1 - 30
Publisher :
Microtome Publishing
Publication date :
2016
ISSN :
1532-4435
English keyword(s) :
reinforcement learning
policy iteration
classification-based approach to policy iteration
finite-sample analysis
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
We introduce a variant of the classification-based approach to policy iteration which uses a cost-sensitive loss function weighting each classification mistake by its actual regret, that is, the difference between the ...
Show more >
We introduce a variant of the classification-based approach to policy iteration which uses a cost-sensitive loss function weighting each classification mistake by its actual regret, that is, the difference between the action-value of the greedy action and of the action chosen by the classifier. For this algorithm, we provide a full finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space (classifier), and a capacity measure which indicates how well the policy space can approximate policies that are greedy with respect to any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. Furthermore it confirms the intuition that classification-based policy iteration algorithms could be favorably compared to value-based approaches when the policies can be approximated more easily than their corresponding value functions. We also study the consistency of the algorithm when there exists a sequence of policy spaces with increasing capacity.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Source :
Harvested from HAL
Files
Thumbnail
  • https://hal.inria.fr/hal-01401513/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-01401513/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-01401513/document
  • Open access
  • Access the document
Thumbnail
  • document
  • Open access
  • Access the document
Thumbnail
  • 10-364.pdf
  • Open access
  • Access the document
Thumbnail
  • document
  • Open access
  • Access the document
Thumbnail
  • 10-364.pdf
  • Open access
  • Access the document
Université de Lille

Mentions légales
Accessibilité : non conforme
Université de Lille © 2017