Preference-based Pure Exploration
Document type :
Communication dans un congrès avec actes
Title :
Preference-based Pure Exploration
Author(s) :
Shukla, Apurv [Auteur]
Texas A&M University System
Basu, Debabrota [Auteur]
Centrale Lille
Université de Lille
Inria Lille - Nord Europe
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
Texas A&M University System
Basu, Debabrota [Auteur]
Centrale Lille
Université de Lille
Inria Lille - Nord Europe
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]
Conference title :
Advances in Neural Information Processing Systems (NeurIPS)
City :
Vancouver (CA)
Country :
Canada
Start date of the conference :
2024-12
English keyword(s) :
Multi-armed Bandits
Pure exploration
Pareto front algorithm
Vector optimization
Multi-objecitve optimisation
Sample complexity
Lower bound
Preference Learning
Convex optimisation
Pure exploration
Pareto front algorithm
Vector optimization
Multi-objecitve optimisation
Sample complexity
Lower bound
Preference Learning
Convex optimisation
HAL domain(s) :
Informatique [cs]/Apprentissage [cs.LG]
Informatique [cs]/Intelligence artificielle [cs.AI]
Statistiques [stat]/Théorie [stat.TH]
Statistiques [stat]/Machine Learning [stat.ML]
Informatique [cs]/Intelligence artificielle [cs.AI]
Statistiques [stat]/Théorie [stat.TH]
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
We study the preference-based pure exploration problem for bandits with vector-valued rewards ordered using a preference cone $\mathcal{C}$ with the goal of identifying the most preferred policy over the set of arms. First, ...
Show more >We study the preference-based pure exploration problem for bandits with vector-valued rewards ordered using a preference cone $\mathcal{C}$ with the goal of identifying the most preferred policy over the set of arms. First, to quantify the impact of preferences, we derive a novel lower bound on the sample complexity for identifying the most preferred policy with confidence level $1-\delta$. Our lower bound elicits the role played by the geometry of the preference cone and punctuates the difference in hardness compared to best-arm variants of the problem. We further explicate this geometry when rewards follow a Gaussian distributions, and provide a convex reformulation of the lower bound. Then, we leverage this convex reformulation of the lower bound to design the Preference-based Track and Stop (PreTS) algorithm that identifies the most preferred policy. Finally, we derive a new concentration result for vector-valued rewards, and show that PreTS achieves a matching sample complexity upper bound.Show less >
Show more >We study the preference-based pure exploration problem for bandits with vector-valued rewards ordered using a preference cone $\mathcal{C}$ with the goal of identifying the most preferred policy over the set of arms. First, to quantify the impact of preferences, we derive a novel lower bound on the sample complexity for identifying the most preferred policy with confidence level $1-\delta$. Our lower bound elicits the role played by the geometry of the preference cone and punctuates the difference in hardness compared to best-arm variants of the problem. We further explicate this geometry when rewards follow a Gaussian distributions, and provide a convex reformulation of the lower bound. Then, we leverage this convex reformulation of the lower bound to design the Preference-based Track and Stop (PreTS) algorithm that identifies the most preferred policy. Finally, we derive a new concentration result for vector-valued rewards, and show that PreTS achieves a matching sample complexity upper bound.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
ANR Project :
Collections :
Source :
Files
- document
- Open access
- Access the document
- 2412.02988v1.pdf
- Open access
- Access the document