Difference of Convex Functions Programming ...
Type de document :
Communication dans un congrès avec actes
Titre :
Difference of Convex Functions Programming for Reinforcement Learning
Auteur(s) :
Piot, Bilal [Auteur]
Sequential Learning [SEQUEL]
IMS : Information, Multimodalité & Signal
Geist, Matthieu [Auteur]
IMS : Information, Multimodalité & Signal
Pietquin, Olivier [Auteur]
Sequential Learning [SEQUEL]
Laboratoire d'Informatique Fondamentale de Lille [LIFL]
Institut universitaire de France [IUF]

Sequential Learning [SEQUEL]
IMS : Information, Multimodalité & Signal
Geist, Matthieu [Auteur]
IMS : Information, Multimodalité & Signal
Pietquin, Olivier [Auteur]
Sequential Learning [SEQUEL]
Laboratoire d'Informatique Fondamentale de Lille [LIFL]
Institut universitaire de France [IUF]
Titre de la manifestation scientifique :
Advances in Neural Information Processing Systems (NIPS 2014)
Ville :
Montreal
Pays :
Canada
Date de début de la manifestation scientifique :
2014-12
Discipline(s) HAL :
Informatique [cs]
Sciences de l'ingénieur [physics]
Sciences de l'ingénieur [physics]
Résumé en anglais : [en]
Large Markov Decision Processes are usually solved using Approximate Dy-namic Programming methods such as Approximate Value Iteration or Ap-proximate Policy Iteration. The main contribution of this paper is to show that, ...
Lire la suite >Large Markov Decision Processes are usually solved using Approximate Dy-namic Programming methods such as Approximate Value Iteration or Ap-proximate Policy Iteration. The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming. To do so, we study the minimization of a norm of the Optimal Bellman Residual (OBR) T * Q − Q, where T * is the so-called optimal Bellman operator. Control-ling this residual allows controlling the distance to the optimal action-value function, and we show that minimizing an empirical norm of the OBR is consistant in the Vapnik sense. Finally, we frame this optimization problem as a DC program. That allows envisioning using the large related literature on DC Programming to address the Reinforcement Leaning problem.Lire moins >
Lire la suite >Large Markov Decision Processes are usually solved using Approximate Dy-namic Programming methods such as Approximate Value Iteration or Ap-proximate Policy Iteration. The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming. To do so, we study the minimization of a norm of the Optimal Bellman Residual (OBR) T * Q − Q, where T * is the so-called optimal Bellman operator. Control-ling this residual allows controlling the distance to the optimal action-value function, and we show that minimizing an empirical norm of the OBR is consistant in the Vapnik sense. Finally, we frame this optimization problem as a DC program. That allows envisioning using the large related literature on DC Programming to address the Reinforcement Leaning problem.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Projet ANR :
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-01104419/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01104419/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01104419/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- 5443-difference-of-convex-functions-programming-for-reinforcement-learning.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- 5443-difference-of-convex-functions-programming-for-reinforcement-learning.pdf
- Accès libre
- Accéder au document