Approximate dynamic programming for ...
Document type :
Communication dans un congrès avec actes
Title :
Approximate dynamic programming for two-player zero-sum Markov games
Author(s) :
Perolat, Julien [Auteur]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences et Technologies
Scherrer, Bruno [Auteur]
Biology, genetics and statistics [BIGS]
Institut Élie Cartan de Lorraine [IECL]
Piot, Bilal [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences Humaines et Sociales
Sequential Learning [SEQUEL]
Pietquin, Olivier [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences et Technologies
Institut universitaire de France [IUF]
Sequential Learning [SEQUEL]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences et Technologies
Scherrer, Bruno [Auteur]
Biology, genetics and statistics [BIGS]
Institut Élie Cartan de Lorraine [IECL]
Piot, Bilal [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences Humaines et Sociales
Sequential Learning [SEQUEL]
Pietquin, Olivier [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences et Technologies
Institut universitaire de France [IUF]
Sequential Learning [SEQUEL]
Conference title :
International Conference on Machine Learning (ICML 2015)
City :
Lille
Country :
France
Start date of the conference :
2015-07-06
Publication date :
2015-07-06
HAL domain(s) :
Informatique [cs]/Apprentissage [cs.LG]
Mathématiques [math]/Mathématiques générales [math.GM]
Informatique [cs]/Recherche d'information [cs.IR]
Mathématiques [math]/Mathématiques générales [math.GM]
Informatique [cs]/Recherche d'information [cs.IR]
English abstract : [en]
This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three ...
Show more >This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we can achieve a stationary policy which is 2γ+ (1−γ) 2-optimal, where is the value function approximation error and is the approximate greedy operator error. In addition , we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum Stochastic Games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes from data) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.Show less >
Show more >This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we can achieve a stationary policy which is 2γ+ (1−γ) 2-optimal, where is the value function approximation error and is the approximate greedy operator error. In addition , we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum Stochastic Games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes from data) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.inria.fr/hal-01153270v3/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01153270v3/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01153270v3/document
- Open access
- Access the document
- document
- Open access
- Access the document
- ICML_2015_JPBSBPOP.pdf
- Open access
- Access the document