Approximate dynamic programming for two-player zero-sum Markov games

Perolat, Julien; Scherrer, Bruno; Piot, Bilal; Pietquin, Olivier

Document type :

Communication dans un congrès avec actes

Title :

Approximate dynamic programming for two-player zero-sum Markov games

Author(s) :

Perolat, Julien [Auteur]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences et Technologies
Scherrer, Bruno [Auteur]
Biology, genetics and statistics [BIGS]
Institut Élie Cartan de Lorraine [IECL]
Piot, Bilal [Auteur]

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences Humaines et Sociales
Sequential Learning [SEQUEL]
Pietquin, Olivier [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences et Technologies
Institut universitaire de France [IUF]
Sequential Learning [SEQUEL]

Conference title :

International Conference on Machine Learning (ICML 2015)

City :

Lille

Country :

France

Start date of the conference :

2015-07-06

Publication date :

2015-07-06

HAL domain(s) :

Informatique [cs]/Apprentissage [cs.LG]
Mathématiques [math]/Mathématiques générales [math.GM]
Informatique [cs]/Recherche d'information [cs.IR]

English abstract : [en]

This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three ...
Show more >This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we can achieve a stationary policy which is 2γ+ (1−γ) 2-optimal, where is the value function approximation error and is the approximate greedy operator error. In addition , we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum Stochastic Games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes from data) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.Show less >

Language :

Anglais

Peer reviewed article :

Oui

Audience :

Internationale

Popular science :

Non

Collections :

Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189

Source :

Harvested from HAL

Files

https://hal.inria.fr/hal-01153270v3/document
Open access
Access the document

https://hal.inria.fr/hal-01153270v3/document
Open access
Access the document

https://hal.inria.fr/hal-01153270v3/document
Open access
Access the document

document
Open access
Access the document

ICML_2015_JPBSBPOP.pdf
Open access
Access the document

Approximate dynamic programming for ... BibTeX CSV Excel RIS

Files

Approximate dynamic programming for ...

BibTeX

CSV

Excel

RIS