Adapting to game trees in zero-sum imperfect ...
Document type :
Pré-publication ou Document de travail
Title :
Adapting to game trees in zero-sum imperfect information games
Author(s) :
Fiegel, Côme [Auteur]
Ecole Nationale de la Statistique et de l'Analyse Economique [ENSAE]
IA coopérative : équité, vie privée, incitations [FAIRPLAY]
Ménard, Pierre [Auteur]
École normale supérieure de Lyon [ENS de Lyon]
Kozuno, Tadashi [Auteur]
Munos, Rémi [Auteur]
DeepMind [Paris]
Perchet, Vianney [Auteur]
Ecole Nationale de la Statistique et de l'Analyse Economique [ENSAE]
Criteo AI Lab
IA coopérative : équité, vie privée, incitations [FAIRPLAY]
Valko, Michal [Auteur]
DeepMind [Paris]
Ecole Nationale de la Statistique et de l'Analyse Economique [ENSAE]
IA coopérative : équité, vie privée, incitations [FAIRPLAY]
Ménard, Pierre [Auteur]
École normale supérieure de Lyon [ENS de Lyon]
Kozuno, Tadashi [Auteur]
Munos, Rémi [Auteur]
DeepMind [Paris]
Perchet, Vianney [Auteur]
Ecole Nationale de la Statistique et de l'Analyse Economique [ENSAE]
Criteo AI Lab
IA coopérative : équité, vie privée, incitations [FAIRPLAY]
Valko, Michal [Auteur]

DeepMind [Paris]
Publication date :
2022-12-23
HAL domain(s) :
Mathématiques [math]
English abstract : [en]
Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory ...
Show more >Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularize leader (FTRL) algorithms for this setting: Balanced-FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive-FTRL which needs $\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ plays without this requirement by progressively adapting the regularization to the observations.Show less >
Show more >Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularize leader (FTRL) algorithms for this setting: Balanced-FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive-FTRL which needs $\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ plays without this requirement by progressively adapting the regularization to the observations.Show less >
Language :
Anglais
Collections :
Source :
Files
- 2212.12567
- Open access
- Access the document