Preference-based Pure Exploration
Type de document :
Communication dans un congrès avec actes
Titre :
Preference-based Pure Exploration
Auteur(s) :
Shukla, Apurv [Auteur]
Texas A&M University System
Basu, Debabrota [Auteur]
Centrale Lille
Université de Lille
Inria Lille - Nord Europe
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
Texas A&M University System
Basu, Debabrota [Auteur]
Centrale Lille
Université de Lille
Inria Lille - Nord Europe
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
Titre de la manifestation scientifique :
Advances in Neural Information Processing Systems (NeurIPS)
Ville :
Vancouver (CA)
Pays :
Canada
Date de début de la manifestation scientifique :
2024-12
Mot(s)-clé(s) en anglais :
Multi-armed Bandits
Pure exploration
Pareto front algorithm
Vector optimization
Multi-objecitve optimisation
Sample complexity
Lower bound
Preference Learning
Convex optimisation
Pure exploration
Pareto front algorithm
Vector optimization
Multi-objecitve optimisation
Sample complexity
Lower bound
Preference Learning
Convex optimisation
Discipline(s) HAL :
Informatique [cs]/Apprentissage [cs.LG]
Informatique [cs]/Intelligence artificielle [cs.AI]
Statistiques [stat]/Théorie [stat.TH]
Statistiques [stat]/Machine Learning [stat.ML]
Informatique [cs]/Intelligence artificielle [cs.AI]
Statistiques [stat]/Théorie [stat.TH]
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
We study the preference-based pure exploration problem for bandits with vector-valued rewards ordered using a preference cone $\mathcal{C}$ with the goal of identifying the most preferred policy over the set of arms. First, ...
Lire la suite >We study the preference-based pure exploration problem for bandits with vector-valued rewards ordered using a preference cone $\mathcal{C}$ with the goal of identifying the most preferred policy over the set of arms. First, to quantify the impact of preferences, we derive a novel lower bound on the sample complexity for identifying the most preferred policy with confidence level $1-\delta$. Our lower bound elicits the role played by the geometry of the preference cone and punctuates the difference in hardness compared to best-arm variants of the problem. We further explicate this geometry when rewards follow a Gaussian distributions, and provide a convex reformulation of the lower bound. Then, we leverage this convex reformulation of the lower bound to design the Preference-based Track and Stop (PreTS) algorithm that identifies the most preferred policy. Finally, we derive a new concentration result for vector-valued rewards, and show that PreTS achieves a matching sample complexity upper bound.Lire moins >
Lire la suite >We study the preference-based pure exploration problem for bandits with vector-valued rewards ordered using a preference cone $\mathcal{C}$ with the goal of identifying the most preferred policy over the set of arms. First, to quantify the impact of preferences, we derive a novel lower bound on the sample complexity for identifying the most preferred policy with confidence level $1-\delta$. Our lower bound elicits the role played by the geometry of the preference cone and punctuates the difference in hardness compared to best-arm variants of the problem. We further explicate this geometry when rewards follow a Gaussian distributions, and provide a convex reformulation of the lower bound. Then, we leverage this convex reformulation of the lower bound to design the Preference-based Track and Stop (PreTS) algorithm that identifies the most preferred policy. Finally, we derive a new concentration result for vector-valued rewards, and show that PreTS achieves a matching sample complexity upper bound.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Projet ANR :
Collections :
Source :
Fichiers
- document
- Accès libre
- Accéder au document
- 2412.02988v1.pdf
- Accès libre
- Accéder au document