Concentrated Differential Privacy for Bandits

Azize, Achraf; Basu, Debabrota

Type de document :

Communication dans un congrès avec actes

DOI :

10.1109/SaTML59370.2024.00013

Titre :

Concentrated Differential Privacy for Bandits

Auteur(s) :

Azize, Achraf [Auteur]
Scool [Scool]
Basu, Debabrota [Auteur]
Centrale Lille
Université de Lille
Inria Lille - Nord Europe
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Scool [Scool]

Titre de la manifestation scientifique :

2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)

Organisateur(s) de la manifestation scientifique :

IEEE

Ville :

Toronto

Pays :

Canada

Date de début de la manifestation scientifique :

2024-04-09

Éditeur :

IEEE

Mot(s)-clé(s) en anglais :

Multi-armed bandits MAB
Bandit / imperfect feedback
Differential privacy
Concentrated differential privacy
Regret Minimization
Lower bounds

Discipline(s) HAL :

Informatique [cs]/Apprentissage [cs.LG]
Informatique [cs]/Cryptographie et sécurité [cs.CR]
Informatique [cs]/Théorie de l'information [cs.IT]
Statistiques [stat]/Théorie [stat.TH]
Statistiques [stat]/Machine Learning [stat.ML]

Résumé en anglais : [en]

Bandits serve as the theoretical foundation of sequential learning and an algorithmic foundation of modern recommender systems. However, recommender systems often rely on user-sensitive data, making privacy a critical ...
Lire la suite >Bandits serve as the theoretical foundation of sequential learning and an algorithmic foundation of modern recommender systems. However, recommender systems often rely on user-sensitive data, making privacy a critical concern. This paper contributes to the understanding of Differential Privacy (DP) in bandits with a trusted centralised decision-maker, and especially the implications of ensuring zero Concentrated Differential Privacy (zCDP). First, we formalise and compare different adaptations of DP to bandits, depending on the considered input and the interaction protocol. Then, we propose three private algorithms, namely AdaC-UCB, AdaC-GOPE and AdaC-OFUL, for three bandit settings, namely finite-armed bandits, linear bandits, and linear contextual bandits. The three algorithms share a generic algorithmic blueprint, i.e. the Gaussian mechanism and adaptive episodes, to ensure a good privacy-utility trade-off. We analyse and upper bound the regret of these three algorithms. Our analysis shows that in all of these settings, the prices of imposing zCDP are (asymptotically) negligible in comparison with the regrets incurred oblivious to privacy. Next, we complement our regret upper bounds with the first minimax lower bounds on the regret of bandits with zCDP. To prove the lower bounds, we elaborate a new proof technique based on couplings and optimal transport. We conclude by experimentally validating our theoretical results for the three different settings of bandits.Lire moins >

Langue :

Anglais

Comité de lecture :

Oui

Audience :

Internationale

Vulgarisation :

Non

Projet ANR :

REPUBLIC: Vers l'IA responsable avec l'apprentissage par renforcement sous contraintes
Foundations of robustness and reliability in artificial intelligence
Programme de formation doctorale en IA à Lille

Collections :

Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189

Source :

Harvested from HAL

Concentrated Differential Privacy for Bandits BibTeX CSV Excel RIS

Concentrated Differential Privacy for Bandits

BibTeX

CSV

Excel

RIS