Best of both worlds: Stochastic & adversarial ...
Document type :
Communication dans un congrès avec actes
Title :
Best of both worlds: Stochastic & adversarial best-arm identification
Author(s) :
Abbasi-Yadkori, Yasin [Auteur]
Adobe Research
Bartlett, Peter [Auteur]
Lawrence Berkeley National Laboratory [Berkeley] [LBNL]
Gabillon, Victor [Auteur]
Queensland University of Technology [Brisbane] [QUT]
Malek, Alan [Auteur]
Massachusetts Institute of Technology [MIT]
Valko, Michal [Auteur]
Sequential Learning [SEQUEL]
Adobe Research
Bartlett, Peter [Auteur]
Lawrence Berkeley National Laboratory [Berkeley] [LBNL]
Gabillon, Victor [Auteur]
Queensland University of Technology [Brisbane] [QUT]
Malek, Alan [Auteur]
Massachusetts Institute of Technology [MIT]
Valko, Michal [Auteur]

Sequential Learning [SEQUEL]
Conference title :
Conference on Learning Theory
City :
Stockholm
Country :
Suède
Start date of the conference :
2018
English keyword(s) :
multi-armed bandits
adversarial and stochastic rewards
best-arm identification
adversarial and stochastic rewards
best-arm identification
HAL domain(s) :
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is ...
Show more >We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.Show less >
Show more >We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
ANR Project :
Collections :
Source :
Files
- document
- Open access
- Access the document
- bob_best_arm_correction2023.pdf
- Open access
- Access the document