Apprentissage par Renforcement: Le Cas Multijoueur

Pérolat, Julien

Type de document :

Thèse

Titre :

Apprentissage par Renforcement: Le Cas Multijoueur

Titre en anglais :

Reinforcement Learning: The Multi-Player Case

Auteur(s) :

Pérolat, Julien [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Sequential Learning [SEQUEL]

Directeur(s) de thèse :

Olivier Pietquin

Date de soutenance :

2017-12-18

Président du jury :

D. Ernst
D. Precup
L. Duchien
R. Ortner
B. Piot
B. Scherrer
K. Tuyls

Membre(s) du jury :

D. Ernst
D. Precup
L. Duchien
R. Ortner
B. Piot
B. Scherrer
K. Tuyls

Organisme de délivrance :

Université de Lille 1 - Sciences et Technologies

École doctorale :

Science pour l'Ingénieur

Mot(s)-clé(s) :

apprentissage par renforcement
théorie des jeux
apprentissage automatique
algorithmes
intelligence artificielle
apprentissage multi-agent.

Mot(s)-clé(s) en anglais :

machine learning
algorithm
artificial intelligence
reinforcement learn- ing
game theory
multi-agent learning.

Discipline(s) HAL :

Informatique [cs]/Intelligence artificielle [cs.AI]
Informatique [cs]/Informatique et théorie des jeux [cs.GT]
Informatique [cs]/Apprentissage [cs.LG]
Informatique [cs]/Système multi-agents [cs.MA]
Statistiques [stat]/Machine Learning [stat.ML]

Résumé :

Ce manuscrit de thèse présente des travaux d'apprentissage par renforcement dans le cadre des jeux stochastiques. Les deux première parties de ce manuscrit sont dédiées à l'apprentissage à partir de données dites batch. ...
Lire la suite >Ce manuscrit de thèse présente des travaux d'apprentissage par renforcement dans le cadre des jeux stochastiques. Les deux première parties de ce manuscrit sont dédiées à l'apprentissage à partir de données dites batch. Une première approche par programmation dynamique approchée est proposée dans le cadre des jeux à deux joueurs à somme nulle et ses limitations sont discutées dans le cadre des jeux à somme générale. Puis, nous étudions une seconde approche par minimisation du résidu de Bellman dans le cadre des jeux à deux joueurs à somme nulle et l'étendons aux jeux à somme générale. Finalement, on s'intéresse à l'apprentissage en ligne et introduisons un algorithme acteur-critique qui converge pour des jeux à somme nulle et des jeux coopératifs à étages.Lire moins >

Résumé en anglais : [en]

This thesis mainly focuses on learning from historical data in a sequential multi-agent environment. We studied the problem of batch learning in Markov games (MGs). Markov games are a generalization of Markov decision ...
Lire la suite >This thesis mainly focuses on learning from historical data in a sequential multi-agent environment. We studied the problem of batch learning in Markov games (MGs). Markov games are a generalization of Markov decision processes (MDPs) to the multi-agent setting. Our approach was to propose learning algorithms to find equilibria in games where the knowledge of the game is limited to interaction samples (also named batch data). To achieve this task, we explore two main approaches.The first approach explored in this thesis is to study approximate dynamic programming techniques. We generalize several batch algorithms from MDPs to zero-sum two-player MGs. This part of our work generalizes several approximate dynamic programming bounds from a L ∞ -norm to a L p -norm. Then we describe, test and comparealgorithms based on those dynamic programming schemes. But these algorithms are highly sensitive to the discount factor (a parameter that controls the time horizon of the problem). To improve those algorithms, we studied many non-stationary variants of approximate dynamic methods to the zero sum two player case. In the end, we show that using non-stationary strategies can be used in general sum games. However, the resulting guarantees are very loose compared to the one on MDPs or zero-sum two-player MGs.The second approach studied in this manuscript is the Bellman residual approach. This approach reduces the problem of learning from batch data to the minimization of a loss function. In a zero-sum two-player MG, we prove that using a Newton’s method on some Bellman residuals is either equivalent to the Least Squares Policy Iteration (LSPI) algorithm or to the Bellman Residual Minimizing Policy Iteration (BRMPI) algorithm. We leverage this link to address the oscillation of LSPI in MDPs and in MGs. Then we show that a Bellman residual approach could be used to learn from batch data in general-sum MGs.Finally in the last part of this dissertation, we study multi-agent independent learning in Multi-Stage Games (MSGs). We provide an actor-critic independent learning algorithm that provably converges in zero-sum two-player MSGs and in cooperative MSGs and empirically converges using function approximation on the game of Alesia.Lire moins >

Langue :

Anglais

Collections :