Softened approximate policy iteration for Markov games

Pérolat, Julien; Piot, Bilal; Geist, Matthieu; Scherrer, Bruno; Pietquin, Olivier

Document type :

Communication dans un congrès avec actes

Title :

Softened approximate policy iteration for Markov games

Author(s) :

Pérolat, Julien [Auteur]
Université de Lille, Sciences et Technologies
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Sequential Learning [SEQUEL]
Piot, Bilal [Auteur]

Université de Lille, Sciences et Technologies
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Sequential Learning [SEQUEL]
Geist, Matthieu [Auteur]
MAchine Learning and Interactive Systems [MALIS]
Scherrer, Bruno [Auteur]
Institut Élie Cartan de Lorraine [IECL]
Biology, genetics and statistics [BIGS]
Pietquin, Olivier [Auteur]
Institut universitaire de France [IUF]
Université de Lille, Sciences et Technologies
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Sequential Learning [SEQUEL]

Conference title :

ICML 2016 - 33rd International Conference on Machine Learning

City :

New York City

Country :

Etats-Unis d'Amérique

Start date of the conference :

2016-06-19

HAL domain(s) :

Mathématiques [math]/Optimisation et contrôle [math.OC]
Computer Science [cs]/Operations Research [math.OC]
Informatique [cs]/Complexité [cs.CC]
Informatique [cs]/Apprentissage [cs.LG]
Mathématiques [math]/Statistiques [math.ST]

English abstract : [en]

This paper reports theoretical and empirical investigations on the use of quasi-Newton methods to minimize the Optimal Bellman Residual (OBR) of zero-sum two-player Markov Games. First, it reveals that state-of-the-art ...
Show more >This paper reports theoretical and empirical investigations on the use of quasi-Newton methods to minimize the Optimal Bellman Residual (OBR) of zero-sum two-player Markov Games. First, it reveals that state-of-the-art algorithms can be derived by the direct application of New-ton's method to different norms of the OBR. More precisely, when applied to the norm of the OBR, Newton's method results in the Bellman Residual Minimization Policy Iteration (BRMPI) and, when applied to the norm of the Projected OBR (POBR), it results into the standard Least Squares Policy Iteration (LSPI) algorithm. Consequently , new algorithms are proposed, making use of quasi-Newton methods to minimize the OBR and the POBR so as to take benefit of enhanced empirical performances at low cost. Indeed , using a quasi-Newton method approach introduces slight modifications in term of coding of LSPI and BRMPI but improves significantly both the stability and the performance of those algorithms. These phenomena are illustrated on an experiment conducted on artificially constructed games called Garnets.Show less >

Language :

Anglais

Peer reviewed article :

Oui

Audience :

Internationale

Popular science :

Non

Collections :

Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189

Source :

Harvested from HAL

Files

https://hal.inria.fr/hal-01393328/document
Open access
Access the document

https://hal.inria.fr/hal-01393328/file/Dir_Ns_100_Na_8_Nb_10_sparsity_0-5sample_49gamma_0-99.pdf
Open access
Access the document

https://hal.inria.fr/hal-01393328/file/Dir_Ns_100_Na_8_Nb_1_sparsity_0-5sample_49gamma_0-99.pdf
Open access
Access the document

https://hal.inria.fr/hal-01393328/file/Dir_Ns_50_Na_2_Nb_1_sparsity_0-3sample_1-0gamma_0-9.pdf
Open access
Access the document

https://hal.inria.fr/hal-01393328/file/Dir_Ns_50_Na_2_Nb_4_sparsity_0-3sample_2-0gamma_0-9.pdf
Open access
Access the document

https://hal.inria.fr/hal-01393328/file/icml_numpapers.eps
Open access
Access the document

https://hal.inria.fr/hal-01393328/file/icml_numpapers.pdf
Open access
Access the document

https://hal.inria.fr/hal-01393328/document
Open access
Access the document

https://hal.inria.fr/hal-01393328/document
Open access
Access the document

document
Open access
Access the document

nmz.pdf
Open access
Access the document

Dir_Ns_100_Na_8_Nb_10_sparsity_0-5sample_49gamma_0-99.pdf
Open access
Access the document

Dir_Ns_100_Na_8_Nb_1_sparsity_0-5sample_49gamma_0-99.pdf
Open access
Access the document

Dir_Ns_50_Na_2_Nb_1_sparsity_0-3sample_1-0gamma_0-9.pdf
Open access
Access the document

Dir_Ns_50_Na_2_Nb_4_sparsity_0-3sample_2-0gamma_0-9.pdf
Open access
Access the document

icml_numpapers.eps
Open access
Access the document

icml_numpapers.pdf
Open access
Access the document

document
Open access
Access the document

nmz.pdf
Open access
Access the document

Dir_Ns_100_Na_8_Nb_10_sparsity_0-5sample_49gamma_0-99.pdf
Open access
Access the document

Dir_Ns_100_Na_8_Nb_1_sparsity_0-5sample_49gamma_0-99.pdf
Open access
Access the document

Dir_Ns_50_Na_2_Nb_1_sparsity_0-3sample_1-0gamma_0-9.pdf
Open access
Access the document

Dir_Ns_50_Na_2_Nb_4_sparsity_0-3sample_2-0gamma_0-9.pdf
Open access
Access the document

icml_numpapers.eps
Open access
Access the document

icml_numpapers.pdf
Open access
Access the document

Softened approximate policy iteration for ... BibTeX CSV Excel RIS

Files

Softened approximate policy iteration for ...

BibTeX

CSV

Excel

RIS