On the Use of Non-Stationary Strategies ...
Document type :
Communication dans un congrès avec actes
Title :
On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games
Author(s) :
Pérolat, Julien [Auteur]
Université de Lille, Sciences et Technologies
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Sequential Learning [SEQUEL]
Piot, Bilal [Auteur]
Université de Lille, Sciences et Technologies
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Sequential Learning [SEQUEL]
Scherrer, Bruno [Auteur]
Institut Élie Cartan de Lorraine [IECL]
Biology, genetics and statistics [BIGS]
Pietquin, Olivier [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Institut universitaire de France [IUF]
Sequential Learning [SEQUEL]
Université de Lille, Sciences et Technologies
Université de Lille, Sciences et Technologies
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Sequential Learning [SEQUEL]
Piot, Bilal [Auteur]

Université de Lille, Sciences et Technologies
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Sequential Learning [SEQUEL]
Scherrer, Bruno [Auteur]
Institut Élie Cartan de Lorraine [IECL]
Biology, genetics and statistics [BIGS]
Pietquin, Olivier [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Institut universitaire de France [IUF]
Sequential Learning [SEQUEL]
Université de Lille, Sciences et Technologies
Conference title :
19th International Conference on Artificial Intelligence and Statistics (AISTATS 2016)
City :
Cadiz
Country :
Espagne
Start date of the conference :
2016-05-09
Journal title :
Proceedings of the International Conference on Artificial Intelligences and Statistics
HAL domain(s) :
Informatique [cs]/Apprentissage [cs.LG]
English abstract : [en]
The main contribution of this paper consists in extending several non-stationary Reinforcement Learning (RL) algorithms and their theoretical guarantees to the case of discounted zero-sum Markov Games (MGs).As ...
Show more >The main contribution of this paper consists in extending several non-stationary Reinforcement Learning (RL) algorithms and their theoretical guarantees to the case of discounted zero-sum Markov Games (MGs).As in the case of Markov Decision Processes (MDPs), non-stationary algorithms are shown to exhibit better performance bounds compared to their stationary counterparts. The obtained bounds are generically composed of three terms: 1) a dependency over gamma (discount factor), 2) a concentrability coefficient and 3) a propagation error term. This error, depending on the algorithm, can be caused by a regression step, a policy evaluation step or a best-response evaluation step. As a second contribution, we empirically demonstrate, on generic MGs (called Garnets), that non-stationary algorithms outperform their stationary counterparts. In addition, it is shown that their performance mostly depends on the nature of the propagation error. Indeed, algorithms where the error is due to the evaluation of a best-response are penalized (even if they exhibit better concentrability coefficients and dependencies on gamma) compared to those suffering from a regression error.Show less >
Show more >The main contribution of this paper consists in extending several non-stationary Reinforcement Learning (RL) algorithms and their theoretical guarantees to the case of discounted zero-sum Markov Games (MGs).As in the case of Markov Decision Processes (MDPs), non-stationary algorithms are shown to exhibit better performance bounds compared to their stationary counterparts. The obtained bounds are generically composed of three terms: 1) a dependency over gamma (discount factor), 2) a concentrability coefficient and 3) a propagation error term. This error, depending on the algorithm, can be caused by a regression step, a policy evaluation step or a best-response evaluation step. As a second contribution, we empirically demonstrate, on generic MGs (called Garnets), that non-stationary algorithms outperform their stationary counterparts. In addition, it is shown that their performance mostly depends on the nature of the propagation error. Indeed, algorithms where the error is due to the evaluation of a best-response are penalized (even if they exhibit better concentrability coefficients and dependencies on gamma) compared to those suffering from a regression error.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :