Bandits Corrupted by Nature: Lower Bounds ...
Type de document :
Compte-rendu et recension critique d'ouvrage
Titre :
Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms
Auteur(s) :
Mathieu, Timothée [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
Basu, Debabrota [Auteur]
Centrale Lille
Université de Lille
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
Maillard, Odalric Ambrym [Auteur]
Université de Lille
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
Basu, Debabrota [Auteur]
Centrale Lille
Université de Lille
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
Maillard, Odalric Ambrym [Auteur]

Université de Lille
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
Titre de la revue :
Transactions on Machine Learning Research Journal
Éditeur :
[Amherst Massachusetts]: OpenReview.net, 2022
Date de publication :
2024-01
ISSN :
2835-8856
Mot(s)-clé(s) en anglais :
Bandits
Robust statistics
Unbounded corruption
Huber's estimator
Lower bounds
Regret bounds
Heavy tail distributions
Robust statistics
Unbounded corruption
Huber's estimator
Lower bounds
Regret bounds
Heavy tail distributions
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Informatique [cs]/Apprentissage [cs.LG]
Mathématiques [math]/Statistiques [math.ST]
Statistiques [stat]/Théorie [stat.TH]
Informatique [cs]/Apprentissage [cs.LG]
Mathématiques [math]/Statistiques [math.ST]
Statistiques [stat]/Théorie [stat.TH]
Résumé en anglais : [en]
We study the Bandits with Stochastic Corruption problem, i.e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent stochastic adversary ...
Lire la suite >We study the Bandits with Stochastic Corruption problem, i.e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent stochastic adversary or Nature. To be specific, the reward obtained by playing an arm comes from corresponding heavy-tailed reward distribution with probability $1-\varepsilon \in (0.5,1]$ and an arbitrary corruption distribution of unbounded support with probability $\varepsilon \in [0,0.5)$. First, we provide \textit{a problem-dependent lower bound on the regret} of any corrupted bandit algorithm. The lower bounds indicate that the Bandits with Stochastic Corruption problem is harder than the classical stochastic bandit problem with sub-Gaussian or heavy-tail rewards. Following that, we propose a novel UCB-type algorithm for Bandits with Stochastic Corruption, namely \texttt{HubUCB}, that builds on Huber's estimator for robust mean estimation. Leveraging a novel concentration inequality of Huber's estimator, we prove that \texttt{HubUCB} achieves a near-optimal regret upper bound. Since computing Huber's estimator has quadratic complexity, we further introduce a sequential version of Huber's estimator that exhibits linear complexity. We leverage this sequential estimator to design \texttt{SeqHubUCB} that enjoys similar regret guarantees while reducing the computational burden. Finally, we experimentally illustrate the efficiency of \texttt{HubUCB} and \texttt{SeqHubUCB} in solving Bandits with Stochastic Corruption for different reward distributions and different levels of corruptions.Lire moins >
Lire la suite >We study the Bandits with Stochastic Corruption problem, i.e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent stochastic adversary or Nature. To be specific, the reward obtained by playing an arm comes from corresponding heavy-tailed reward distribution with probability $1-\varepsilon \in (0.5,1]$ and an arbitrary corruption distribution of unbounded support with probability $\varepsilon \in [0,0.5)$. First, we provide \textit{a problem-dependent lower bound on the regret} of any corrupted bandit algorithm. The lower bounds indicate that the Bandits with Stochastic Corruption problem is harder than the classical stochastic bandit problem with sub-Gaussian or heavy-tail rewards. Following that, we propose a novel UCB-type algorithm for Bandits with Stochastic Corruption, namely \texttt{HubUCB}, that builds on Huber's estimator for robust mean estimation. Leveraging a novel concentration inequality of Huber's estimator, we prove that \texttt{HubUCB} achieves a near-optimal regret upper bound. Since computing Huber's estimator has quadratic complexity, we further introduce a sequential version of Huber's estimator that exhibits linear complexity. We leverage this sequential estimator to design \texttt{SeqHubUCB} that enjoys similar regret guarantees while reducing the computational burden. Finally, we experimentally illustrate the efficiency of \texttt{HubUCB} and \texttt{SeqHubUCB} in solving Bandits with Stochastic Corruption for different reward distributions and different levels of corruptions.Lire moins >
Langue :
Anglais
Vulgarisation :
Non
Projet ANR :
Commentaire :
https://openreview.net/forum?id=oGIR0ic3jU
Collections :
Source :
Fichiers
- document
- Accès libre
- Accéder au document
- 530_bandits_corrupted_by_nature_lo.pdf
- Accès libre
- Accéder au document