A Relative Exponential Weighing Algorithm ...
Type de document :
Communication dans un congrès avec actes
Titre :
A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
Auteur(s) :
Gajane, Pratik [Auteur]
Sequential Learning [SEQUEL]
Orange Labs [Lannion]
Urvoy, Tanguy [Auteur]
Orange Labs [Lannion]
Clérot, Fabrice [Auteur]
Orange Labs [Lannion]
Sequential Learning [SEQUEL]
Orange Labs [Lannion]
Urvoy, Tanguy [Auteur]
Orange Labs [Lannion]
Clérot, Fabrice [Auteur]
Orange Labs [Lannion]
Titre de la manifestation scientifique :
Proceedings of the 32nd International Conference on Machine Learning
Ville :
Lille
Pays :
France
Date de début de la manifestation scientifique :
2015-07-06
Date de publication :
2015
Mot(s)-clé(s) en anglais :
MAB
online learning
dueling bandits
online learning
dueling bandits
Discipline(s) HAL :
Informatique [cs]/Autre [cs.OH]
Résumé en anglais : [en]
We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new ...
Lire la suite >We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.Lire moins >
Lire la suite >We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-01225614/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01225614/file/gajane15-supp.zip
- Accès libre
- Accéder au document
- http://arxiv.org/pdf/1601.03855
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01225614/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01225614/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- rex3_icml.pdf
- Accès libre
- Accéder au document
- gajane15-supp.zip
- Accès libre
- Accéder au document
- 1601.03855
- Accès libre
- Accéder au document