• English
    • français
  • Help
  •  | 
  • Contact
  •  | 
  • About
  •  | 
  • Login
  • HAL portal
  •  | 
  • Pages Pro
  • EN
  •  / 
  • FR
View Item 
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A Relative Exponential Weighing Algorithm ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Document type :
Communication dans un congrès avec actes
Title :
A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
Author(s) :
Gajane, Pratik [Auteur]
Orange Labs [Lannion]
Sequential Learning [SEQUEL]
Urvoy, Tanguy [Auteur]
Orange Labs [Lannion]
Clérot, Fabrice [Auteur]
Orange Labs [Lannion]
Conference title :
Proceedings of the 32nd International Conference on Machine Learning
City :
Lille
Country :
France
Start date of the conference :
2015-07-06
Publication date :
2015
English keyword(s) :
MAB
online learning
dueling bandits
HAL domain(s) :
Informatique [cs]/Autre [cs.OH]
English abstract : [en]
We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new ...
Show more >
We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Source :
Harvested from HAL
Files
Thumbnail
  • https://hal.inria.fr/hal-01225614/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-01225614/file/gajane15-supp.zip
  • Open access
  • Access the document
Thumbnail
  • http://arxiv.org/pdf/1601.03855
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-01225614/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-01225614/document
  • Open access
  • Access the document
Thumbnail
  • document
  • Open access
  • Access the document
Thumbnail
  • rex3_icml.pdf
  • Open access
  • Access the document
Thumbnail
  • gajane15-supp.zip
  • Open access
  • Access the document
Thumbnail
  • 1601.03855
  • Open access
  • Access the document
Thumbnail
  • document
  • Open access
  • Access the document
Thumbnail
  • rex3_icml.pdf
  • Open access
  • Access the document
Thumbnail
  • gajane15-supp.zip
  • Open access
  • Access the document
Université de Lille

Mentions légales
Accessibilité : non conforme
Université de Lille © 2017