Optimal Thompson Sampling strategies for ...
Type de document :
Autre communication scientifique (congrès sans actes - poster - séminaire...): Communication dans un congrès avec actes
URL permanente :
Titre :
Optimal Thompson Sampling strategies for support-aware CVaR bandits
Auteur(s) :
Baudry, Dorian [Auteur]
Scool [Scool]
Gautron, Romain [Auteur]
Scool [Scool]
Kaufmann, Emilie [Auteur]
Scool [Scool]
Maillard, Odalric-Ambrym [Auteur]
Scool [Scool]
Scool [Scool]
Gautron, Romain [Auteur]
Scool [Scool]
Kaufmann, Emilie [Auteur]
Scool [Scool]
Maillard, Odalric-Ambrym [Auteur]
Scool [Scool]
Titre de la manifestation scientifique :
ICML 2021 - International Conference on Machine Learning
Ville :
Virtual Conference
Pays :
Etats-Unis d'Amérique
Date de début de la manifestation scientifique :
2021-07-18
Discipline(s) HAL :
Mathématiques [math]/Statistiques [math.ST]
Statistiques [stat]/Autres [stat.ML]
Statistiques [stat]/Autres [stat.ML]
Résumé en anglais : [en]
In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level α of the reward distribution. While existing works in this setting mainly ...
Lire la suite >In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level α of the reward distribution. While existing works in this setting mainly focus on Upper Confidence Bound algorithms, we introduce a new Thompson Sampling approach for CVaR bandits on bounded rewards that is flexible enough to solve a variety of problems grounded on physical resources. Building on a recent work by Riou and Honda (2020), we introduce B-CVTS for continuous bounded rewards and M-CVTS for multinomial distributions. On the theoretical side, we provide a non-trivial extension of their analysis that enables to theoretically bound their CVaR regret minimization performance. Strikingly, our results show that these strategies are the first to provably achieve asymptotic optimality in CVaR bandits, matching the corresponding asymptotic lower bounds for this setting. Further, we illustrate empirically the benefit of Thompson Sampling approaches both in a realistic environment simulating a use-case in agriculture and on various synthetic examples.Lire moins >
Lire la suite >In this paper we study a multi-arm bandit problem in which the quality of each arm is measured by the Conditional Value at Risk (CVaR) at some level α of the reward distribution. While existing works in this setting mainly focus on Upper Confidence Bound algorithms, we introduce a new Thompson Sampling approach for CVaR bandits on bounded rewards that is flexible enough to solve a variety of problems grounded on physical resources. Building on a recent work by Riou and Honda (2020), we introduce B-CVTS for continuous bounded rewards and M-CVTS for multinomial distributions. On the theoretical side, we provide a non-trivial extension of their analysis that enables to theoretically bound their CVaR regret minimization performance. Strikingly, our results show that these strategies are the first to provably achieve asymptotic optimality in CVaR bandits, matching the corresponding asymptotic lower bounds for this setting. Further, we illustrate empirically the benefit of Thompson Sampling approaches both in a realistic environment simulating a use-case in agriculture and on various synthetic examples.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Projet ANR :
Collections :
Source :
Date de dépôt :
2021-12-11T02:00:32Z
Fichiers
- https://hal.archives-ouvertes.fr/hal-03472593/document
- Accès libre
- Accéder au document