Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms

Mathieu, Timothée; Basu, Debabrota; Maillard, Odalric Ambrym

Type de document :

Compte-rendu et recension critique d'ouvrage

Titre :

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms

Auteur(s) :

Mathieu, Timothée [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
Basu, Debabrota [Auteur]
Centrale Lille
Université de Lille
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
Maillard, Odalric Ambrym [Auteur]

Université de Lille
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]

Titre de la revue :

Transactions on Machine Learning Research Journal

Éditeur :

[Amherst Massachusetts]: OpenReview.net, 2022

Date de publication :

2024-01

ISSN :

2835-8856

Mot(s)-clé(s) en anglais :

Bandits
Robust statistics
Unbounded corruption
Huber's estimator
Lower bounds
Regret bounds
Heavy tail distributions

Discipline(s) HAL :

Statistiques [stat]/Machine Learning [stat.ML]
Informatique [cs]/Apprentissage [cs.LG]
Mathématiques [math]/Statistiques [math.ST]
Statistiques [stat]/Théorie [stat.TH]

Résumé en anglais : [en]

We study the Bandits with Stochastic Corruption problem, i.e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent stochastic adversary ...
Lire la suite >We study the Bandits with Stochastic Corruption problem, i.e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent stochastic adversary or Nature. To be specific, the reward obtained by playing an arm comes from corresponding heavy-tailed reward distribution with probability $1-\varepsilon \in (0.5,1]$ and an arbitrary corruption distribution of unbounded support with probability $\varepsilon \in [0,0.5)$. First, we provide \textit{a problem-dependent lower bound on the regret} of any corrupted bandit algorithm. The lower bounds indicate that the Bandits with Stochastic Corruption problem is harder than the classical stochastic bandit problem with sub-Gaussian or heavy-tail rewards. Following that, we propose a novel UCB-type algorithm for Bandits with Stochastic Corruption, namely \texttt{HubUCB}, that builds on Huber's estimator for robust mean estimation. Leveraging a novel concentration inequality of Huber's estimator, we prove that \texttt{HubUCB} achieves a near-optimal regret upper bound. Since computing Huber's estimator has quadratic complexity, we further introduce a sequential version of Huber's estimator that exhibits linear complexity. We leverage this sequential estimator to design \texttt{SeqHubUCB} that enjoys similar regret guarantees while reducing the computational burden. Finally, we experimentally illustrate the efficiency of \texttt{HubUCB} and \texttt{SeqHubUCB} in solving Bandits with Stochastic Corruption for different reward distributions and different levels of corruptions.Lire moins >

Langue :

Anglais

Vulgarisation :

Non

Projet ANR :

REPUBLIC: Vers l'IA responsable avec l'apprentissage par renforcement sous contraintes
Foundations of robustness and reliability in artificial intelligence

Commentaire :

https://openreview.net/forum?id=oGIR0ic3jU

Collections :