• English
    • français
  • Help
  •  | 
  • Contact
  •  | 
  • About
  •  | 
  • Login
  • HAL portal
  •  | 
  • Pages Pro
  • EN
  •  / 
  • FR
View Item 
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Optimal Thompson Sampling strategies for ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Document type :
Communication dans un congrès avec actes
Title :
Optimal Thompson Sampling strategies for support-aware CVaR bandits
Author(s) :
Baudry, Dorian [Auteur]
Scool [Scool]

Gautron, Romain [Auteur]
Agroécologie et Intensification Durables des cultures annuelles [UPR AIDA]
Kaufmann, Emilie [Auteur] refId
Scool [Scool]

Maillard, Odalric-Ambrym [Auteur] refId
Scool [Scool]
Conference title :
38th International Conference on Machine Learning
City :
Virtual
Country :
Etats-Unis d'Amérique
Start date of the conference :
2021-07-18
Journal title :
Proceedings of Machine Learning Research
HAL domain(s) :
Informatique [cs]
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level alpha of the reward distribution. While existing works in this setting ...
Show more >
In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level alpha of the reward distribution. While existing works in this setting mainly focus on Upper Confidence Bound algorithms, we introduce a new Thompson Sampling approach for CVaR bandits on bounded rewards that is flexible enough to solve a variety of problems grounded on physical resources. Building on a recent work by Riou & Honda (2020), we introduce B-CVTS for continuous bounded rewards and M-CVTS for multinomial distributions. On the theoretical side, we provide a non-trivial extension of their analysis that enables to theoretically bound their CVaR regret minimization performance. Strikingly, our results show that these strategies are the first to provably achieve asymptotic optimality in CVaR bandits, matching the corresponding asymptotic lower bounds for this setting. Further, we illustrate empirically the benefit of Thompson Sampling approaches both in a realistic environment simulating a use-case in agriculture and on various synthetic examples.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
ANR Project :
Au delà de l'apprentissage séquentiel pour de meilleures prises de décisions
BANDITS MANCHOTS POUR SIGNAUX NON-STATIONNAIRES ET STRUCTURES
Comment :
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)
Collections :
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Source :
Harvested from HAL
Files
Thumbnail
  • http://arxiv.org/pdf/2012.05754
  • Open access
  • Access the document
Thumbnail
  • https://hal.archives-ouvertes.fr/hal-03447244/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.archives-ouvertes.fr/hal-03447244/document
  • Open access
  • Access the document
Université de Lille

Mentions légales
Université de Lille © 2017