Bandits Corrupted by Nature: Lower Bounds ...
Document type :
Pré-publication ou Document de travail
Title :
Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm
Author(s) :
Basu, Debabrota [Auteur]
Scool [Scool]
Maillard, Odalric Ambrym [Auteur]
Scool [Scool]
Mathieu, Timothée [Auteur]
Scool [Scool]
Scool [Scool]
Maillard, Odalric Ambrym [Auteur]

Scool [Scool]
Mathieu, Timothée [Auteur]
Scool [Scool]
English keyword(s) :
Unbounded corruption
Heavy-tail distributions
Huber's estimator
Regret bounds
Heavy-tail distributions
Huber's estimator
Regret bounds
HAL domain(s) :
Mathématiques [math]
Statistiques [stat]/Machine Learning [stat.ML]
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted reward distributions or arms with time-invariant corruption distributions. At each iteration, the player chooses an arm. Given ...
Show more >In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted reward distributions or arms with time-invariant corruption distributions. At each iteration, the player chooses an arm. Given the arm, the environment returns an uncorrupted reward with probability 1−ε and an arbitrarily corrupted reward with probability ε. In our setting, the uncorrupted reward might be heavy-tailed and the corrupted reward might be unbounded. We prove a lower bound on the regret indicating that the corrupted and heavy-tailed bandits are strictly harder than uncorrupted or light-tailed bandits. We observe that the environments can be categorised into hardness regimes depending on the suboptimality gap ∆, variance σ, and corruption proportion ϵ. Following this, we design a UCB-type algorithm, namely HuberUCB, that leverages Huber's estimator for robust mean estimation. HuberUCB leads to tight upper bounds on regret in the proposed corrupted and heavy-tailed setting. To derive the upper bound, we prove a novel concentration inequality for Huber's estimator, which might be of independent interest.Show less >
Show more >In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted reward distributions or arms with time-invariant corruption distributions. At each iteration, the player chooses an arm. Given the arm, the environment returns an uncorrupted reward with probability 1−ε and an arbitrarily corrupted reward with probability ε. In our setting, the uncorrupted reward might be heavy-tailed and the corrupted reward might be unbounded. We prove a lower bound on the regret indicating that the corrupted and heavy-tailed bandits are strictly harder than uncorrupted or light-tailed bandits. We observe that the environments can be categorised into hardness regimes depending on the suboptimality gap ∆, variance σ, and corruption proportion ϵ. Following this, we design a UCB-type algorithm, namely HuberUCB, that leverages Huber's estimator for robust mean estimation. HuberUCB leads to tight upper bounds on regret in the proposed corrupted and heavy-tailed setting. To derive the upper bound, we prove a novel concentration inequality for Huber's estimator, which might be of independent interest.Show less >
Language :
Anglais
Collections :
Source :
Files
- https://hal.archives-ouvertes.fr/hal-03611816/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-03611816/file/article.zip
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-03611816/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-03611816/document
- Open access
- Access the document
- document
- Open access
- Access the document
- main.pdf
- Open access
- Access the document
- article.zip
- Open access
- Access the document