Spectral bandits
Type de document :
Article dans une revue scientifique: Article original
Titre :
Spectral bandits
Auteur(s) :
Kocák, Tomáš [Auteur]
Scool [Scool]
École normale supérieure de Lyon [ENS de Lyon]
Munos, Rémi [Auteur]
Scool [Scool]
DeepMind [Paris]
Kveton, Branislav [Auteur]
Google Research
Agrawal, Shipra [Auteur]
Columbia University [New York]
Valko, Michal [Auteur]
DeepMind [Paris]
Scool [Scool]
École normale supérieure de Lyon [ENS de Lyon]
Munos, Rémi [Auteur]
Scool [Scool]
DeepMind [Paris]
Kveton, Branislav [Auteur]
Google Research
Agrawal, Shipra [Auteur]
Columbia University [New York]
Valko, Michal [Auteur]

DeepMind [Paris]
Titre de la revue :
Journal of Machine Learning Research
Éditeur :
Microtome Publishing
Date de publication :
2020
ISSN :
1532-4435
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving ...
Lire la suite >Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node of an undirected graph and its expected rating is similar to the one of its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose three algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on contentrecommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of node evaluations.Lire moins >
Lire la suite >Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node of an undirected graph and its expected rating is similar to the one of its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose three algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on contentrecommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of node evaluations.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Projet ANR :
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-03084249/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-03084249/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-03084249/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-03084249/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- kocak2020spectral.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- kocak2020spectral.pdf
- Accès libre
- Accéder au document