• English
    • français
  • Help
  •  | 
  • Contact
  •  | 
  • About
  •  | 
  • Login
  • HAL portal
  •  | 
  • Pages Pro
  • EN
  •  / 
  • FR
View Item 
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Adapting to game trees in zero-sum imperfect ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Document type :
Pré-publication ou Document de travail
Title :
Adapting to game trees in zero-sum imperfect information games
Author(s) :
Fiegel, Côme [Auteur]
Ecole Nationale de la Statistique et de l'Analyse Economique [ENSAE]
IA coopérative : équité, vie privée, incitations [FAIRPLAY]
Ménard, Pierre [Auteur]
École normale supérieure de Lyon [ENS de Lyon]
Kozuno, Tadashi [Auteur]
Munos, Rémi [Auteur]
DeepMind [Paris]
Perchet, Vianney [Auteur]
Ecole Nationale de la Statistique et de l'Analyse Economique [ENSAE]
Criteo AI Lab
IA coopérative : équité, vie privée, incitations [FAIRPLAY]
Valko, Michal [Auteur] refId
DeepMind [Paris]
Publication date :
2022-12-23
HAL domain(s) :
Mathématiques [math]
English abstract : [en]
Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory ...
Show more >
Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularize leader (FTRL) algorithms for this setting: Balanced-FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive-FTRL which needs $\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ plays without this requirement by progressively adapting the regularization to the observations.Show less >
Language :
Anglais
Collections :
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Source :
Harvested from HAL
Files
Thumbnail
  • 2212.12567
  • Open access
  • Access the document
Université de Lille

Mentions légales
Université de Lille © 2017