Risk-aware linear bandits with convex loss
Type de document :
Communication dans un congrès avec actes
Titre :
Risk-aware linear bandits with convex loss
Auteur(s) :
Saux, Patrick [Auteur]
Scool [Scool]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Maillard, Odalric Ambrym [Auteur]
Scool [Scool]
Scool [Scool]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Maillard, Odalric Ambrym [Auteur]

Scool [Scool]
Titre de la manifestation scientifique :
European Workshop on Reinforcement Learning
Ville :
Milan
Pays :
Italie
Date de début de la manifestation scientifique :
2022-09-19
Date de publication :
2022-09-19
Mot(s)-clé(s) en anglais :
Contextual bandit
UCB
risk-awareness
elicitable risk measures
online gradient descent
UCB
risk-awareness
elicitable risk measures
online gradient descent
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
In decision-making problems such as the multi-armed bandit, an agent learns sequentially by optimizing a certain feedback. While the mean reward criterion has been extensively studied, other measures that reflect an aversion ...
Lire la suite >In decision-making problems such as the multi-armed bandit, an agent learns sequentially by optimizing a certain feedback. While the mean reward criterion has been extensively studied, other measures that reflect an aversion to adverse outcomes, such as mean-variance or conditional value-at-risk (CVaR), can be of interest for critical applications (healthcare, agriculture). Algorithms have been proposed for such risk-aware measures under bandit feedback without contextual information. In this work, we study contextual bandits where such risk measures can be elicited as linear functions of the contexts through the minimization of a convex loss. A typical example that fits within this framework is the expectile measure, which is obtained as the solution of an asymmetric least-square problem. Using the method of mixtures for supermartingales, we derive confidence sequences for the estimation of such risk measures. We then propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits. This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent, at the cost of slightly higher regret. We conclude by evaluating the resulting algorithms on numerical experiments.Lire moins >
Lire la suite >In decision-making problems such as the multi-armed bandit, an agent learns sequentially by optimizing a certain feedback. While the mean reward criterion has been extensively studied, other measures that reflect an aversion to adverse outcomes, such as mean-variance or conditional value-at-risk (CVaR), can be of interest for critical applications (healthcare, agriculture). Algorithms have been proposed for such risk-aware measures under bandit feedback without contextual information. In this work, we study contextual bandits where such risk measures can be elicited as linear functions of the contexts through the minimization of a convex loss. A typical example that fits within this framework is the expectile measure, which is obtained as the solution of an asymmetric least-square problem. Using the method of mixtures for supermartingales, we derive confidence sequences for the estimation of such risk measures. We then propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits. This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent, at the cost of slightly higher regret. We conclude by evaluating the resulting algorithms on numerical experiments.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-03776680/document
- Accès libre
- Accéder au document
- http://arxiv.org/pdf/2209.07154
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-03776680/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-03776680/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- main.pdf
- Accès libre
- Accéder au document
- 2209.07154
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- main.pdf
- Accès libre
- Accéder au document