Exploiting structure of uncertainty for ...
Type de document :
Communication dans un congrès avec actes
Titre :
Exploiting structure of uncertainty for efficient matroid semi-bandits
Auteur(s) :
Perrault, Pierre [Auteur]
Sequential Learning [SEQUEL]
Perchet, Vianney [Auteur]
Centre de Recherche en Économie et Statistique [CREST]
Valko, Michal [Auteur]
DeepMind [Paris]
Sequential Learning [SEQUEL]
Sequential Learning [SEQUEL]
Perchet, Vianney [Auteur]
Centre de Recherche en Économie et Statistique [CREST]
Valko, Michal [Auteur]

DeepMind [Paris]
Sequential Learning [SEQUEL]
Titre de la manifestation scientifique :
International Conference on Machine Learning
Ville :
Long Beach
Pays :
Etats-Unis d'Amérique
Date de début de la manifestation scientifique :
2019
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
We improve the efficiency of algorithms for stochastic combinatorial semi-bandits. In most interesting problems, state-of-the-art algorithms take advantage of structural properties of rewards, such as independence. However, ...
Lire la suite >We improve the efficiency of algorithms for stochastic combinatorial semi-bandits. In most interesting problems, state-of-the-art algorithms take advantage of structural properties of rewards, such as independence. However, while being optimal in terms of asymptotic regret, these algorithms are inefficient. In our paper, we first reduce their implementation to a specific submod-ular maximization. Then, in case of matroid constraints , we design adapted approximation routines , thereby providing the first efficient algorithms that rely on reward structure to improve regret bound. In particular, we improve the state-of-the-art efficient gap-free regret bound by a factor √ m/ log m, where m is the maximum action size. Finally, we show how our improvement translates to more general budgeted combinato-rial semi-bandits.Lire moins >
Lire la suite >We improve the efficiency of algorithms for stochastic combinatorial semi-bandits. In most interesting problems, state-of-the-art algorithms take advantage of structural properties of rewards, such as independence. However, while being optimal in terms of asymptotic regret, these algorithms are inefficient. In our paper, we first reduce their implementation to a specific submod-ular maximization. Then, in case of matroid constraints , we design adapted approximation routines , thereby providing the first efficient algorithms that rely on reward structure to improve regret bound. In particular, we improve the state-of-the-art efficient gap-free regret bound by a factor √ m/ log m, where m is the maximum action size. Finally, we show how our improvement translates to more general budgeted combinato-rial semi-bandits.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-02387478/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-02387478/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-02387478/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- supplementary.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- supplementary.pdf
- Accès libre
- Accéder au document