Best of both worlds: Stochastic & adversarial ...
Type de document :
Communication dans un congrès avec actes
Titre :
Best of both worlds: Stochastic & adversarial best-arm identification
Auteur(s) :
Abbasi-Yadkori, Yasin [Auteur]
Adobe Research
Bartlett, Peter [Auteur]
Lawrence Berkeley National Laboratory [Berkeley] [LBNL]
Gabillon, Victor [Auteur]
Queensland University of Technology [Brisbane] [QUT]
Malek, Alan [Auteur]
Massachusetts Institute of Technology [MIT]
Valko, Michal [Auteur]
Sequential Learning [SEQUEL]
Adobe Research
Bartlett, Peter [Auteur]
Lawrence Berkeley National Laboratory [Berkeley] [LBNL]
Gabillon, Victor [Auteur]
Queensland University of Technology [Brisbane] [QUT]
Malek, Alan [Auteur]
Massachusetts Institute of Technology [MIT]
Valko, Michal [Auteur]

Sequential Learning [SEQUEL]
Titre de la manifestation scientifique :
Conference on Learning Theory
Ville :
Stockholm
Pays :
Suède
Date de début de la manifestation scientifique :
2018
Mot(s)-clé(s) en anglais :
multi-armed bandits
adversarial and stochastic rewards
best-arm identification
adversarial and stochastic rewards
best-arm identification
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is ...
Lire la suite >We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.Lire moins >
Lire la suite >We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Projet ANR :
Collections :
Source :
Fichiers
- document
- Accès libre
- Accéder au document
- bob_best_arm_correction2023.pdf
- Accès libre
- Accéder au document