A Relative Exponential Weighing Algorithm ...
Document type :
Communication dans un congrès avec actes
Title :
A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
Author(s) :
Gajane, Pratik [Auteur]
Orange Labs [Lannion]
Sequential Learning [SEQUEL]
Urvoy, Tanguy [Auteur]
Orange Labs [Lannion]
Clérot, Fabrice [Auteur]
Orange Labs [Lannion]
Orange Labs [Lannion]
Sequential Learning [SEQUEL]
Urvoy, Tanguy [Auteur]
Orange Labs [Lannion]
Clérot, Fabrice [Auteur]
Orange Labs [Lannion]
Conference title :
Proceedings of the 32nd International Conference on Machine Learning
City :
Lille
Country :
France
Start date of the conference :
2015-07-06
Publication date :
2015
English keyword(s) :
MAB
online learning
dueling bandits
online learning
dueling bandits
HAL domain(s) :
Informatique [cs]/Autre [cs.OH]
English abstract : [en]
We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new ...
Show more >We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.Show less >
Show more >We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.inria.fr/hal-01225614/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01225614/file/gajane15-supp.zip
- Open access
- Access the document
- http://arxiv.org/pdf/1601.03855
- Open access
- Access the document
- https://hal.inria.fr/hal-01225614/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01225614/document
- Open access
- Access the document
- document
- Open access
- Access the document
- rex3_icml.pdf
- Open access
- Access the document
- gajane15-supp.zip
- Open access
- Access the document
- 1601.03855
- Open access
- Access the document
- document
- Open access
- Access the document
- rex3_icml.pdf
- Open access
- Access the document
- gajane15-supp.zip
- Open access
- Access the document