Approximate dynamic programming for ...
Type de document :
Communication dans un congrès avec actes
Titre :
Approximate dynamic programming for two-player zero-sum Markov games
Auteur(s) :
Perolat, Julien [Auteur]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences et Technologies
Scherrer, Bruno [Auteur]
Biology, genetics and statistics [BIGS]
Institut Élie Cartan de Lorraine [IECL]
Piot, Bilal [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences Humaines et Sociales
Sequential Learning [SEQUEL]
Pietquin, Olivier [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences et Technologies
Institut universitaire de France [IUF]
Sequential Learning [SEQUEL]
Sequential Learning [SEQUEL]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences et Technologies
Scherrer, Bruno [Auteur]
Biology, genetics and statistics [BIGS]
Institut Élie Cartan de Lorraine [IECL]
Piot, Bilal [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences Humaines et Sociales
Sequential Learning [SEQUEL]
Pietquin, Olivier [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Université de Lille, Sciences et Technologies
Institut universitaire de France [IUF]
Sequential Learning [SEQUEL]
Titre de la manifestation scientifique :
International Conference on Machine Learning (ICML 2015)
Ville :
Lille
Pays :
France
Date de début de la manifestation scientifique :
2015-07-06
Date de publication :
2015-07-06
Discipline(s) HAL :
Informatique [cs]/Apprentissage [cs.LG]
Mathématiques [math]/Mathématiques générales [math.GM]
Informatique [cs]/Recherche d'information [cs.IR]
Mathématiques [math]/Mathématiques générales [math.GM]
Informatique [cs]/Recherche d'information [cs.IR]
Résumé en anglais : [en]
This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three ...
Lire la suite >This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we can achieve a stationary policy which is 2γ+ (1−γ) 2-optimal, where is the value function approximation error and is the approximate greedy operator error. In addition , we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum Stochastic Games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes from data) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.Lire moins >
Lire la suite >This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we can achieve a stationary policy which is 2γ+ (1−γ) 2-optimal, where is the value function approximation error and is the approximate greedy operator error. In addition , we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum Stochastic Games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes from data) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-01153270v3/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01153270v3/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01153270v3/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- ICML_2015_JPBSBPOP.pdf
- Accès libre
- Accéder au document