Interactive and Concentrated Differential Privacy for Bandits

Azize, Achraf; Basu, Debabrota

Type de document :

Communication dans un congrès avec actes

Titre :

Interactive and Concentrated Differential Privacy for Bandits

Auteur(s) :

Azize, Achraf [Auteur]
Scool [Scool]
Basu, Debabrota [Auteur]
Scool [Scool]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Inria Lille - Nord Europe

Titre de la manifestation scientifique :

EWRL 2023 – European Workshop on Reinforcement Learning

Ville :

Brussels (Belgium)

Pays :

Belgique

Date de début de la manifestation scientifique :

2023-09

Date de publication :

2023-09

Mot(s)-clé(s) en anglais :

Multi- armed bandits
Differential Privacy
Regret Minimization
Regret bounds
Lower bounds

Discipline(s) HAL :

Informatique [cs]/Apprentissage [cs.LG]
Informatique [cs]/Intelligence artificielle [cs.AI]
Informatique [cs]/Théorie de l'information [cs.IT]
Statistiques [stat]/Théorie [stat.TH]
Statistiques [stat]/Machine Learning [stat.ML]

Résumé en anglais : [en]

Bandits play a crucial role in interactive learning schemes and modern recommender systems. However, these systems often rely on sensitive user data, making privacy a critical concern. This paper investigates privacy in ...
Lire la suite >Bandits play a crucial role in interactive learning schemes and modern recommender systems. However, these systems often rely on sensitive user data, making privacy a critical concern. This paper investigates privacy in bandits with a trusted centralized decision-maker through the lens of interactive Differential Privacy (DP). While bandits under pure $\epsilon$-global DP have been well-studied, we contribute to the understanding of bandits under zero Concentrated DP (zCDP). We provide minimax and problem-dependent lower bounds on regret for finite-armed and linear bandits, which quantify the cost of $\rho$-global zCDP in these settings. These lower bounds reveal two hardness regimes based on the privacy budget $\rho$ and suggest that $\rho$-global zCDP incurs less regret than pure $\epsilon$-global DP. We propose two $\rho$-global zCDP bandit algorithms, AdaC-UCB and AdaC-GOPE, for finite-armed and linear bandits respectively. Both algorithms use a common recipe of Gaussian mechanism and adaptive episodes. We analyze the regret of these algorithms to show that AdaC-UCB achieves the problem-dependent regret lower bound up to multiplicative constants, while AdaC-GOPE achieves the minimax regret lower bound up to poly-logarithmic factors. Finally, we provide experimental validation of our theoretical results under different settings.Lire moins >

Langue :

Anglais

Comité de lecture :

Oui

Audience :

Internationale

Vulgarisation :

Non

Projet ANR :

REPUBLIC: Vers l'IA responsable avec l'apprentissage par renforcement sous contraintes
Programme de formation doctorale en IA à Lille

Collections :