Distributed adaptive sampling for kernel ...
Type de document :
Communication dans un congrès avec actes
Titre :
Distributed adaptive sampling for kernel matrix approximation
Auteur(s) :
Calandriello, Daniele [Auteur]
Sequential Learning [SEQUEL]
Lazaric, Alessandro [Auteur]
Sequential Learning [SEQUEL]
Valko, Michal [Auteur]
Sequential Learning [SEQUEL]
Sequential Learning [SEQUEL]
Lazaric, Alessandro [Auteur]
Sequential Learning [SEQUEL]
Valko, Michal [Auteur]
Sequential Learning [SEQUEL]
Titre de la manifestation scientifique :
International Conference on Artificial Intelligence and Statistics
Ville :
Fort Lauderdale
Pays :
Etats-Unis d'Amérique
Date de début de la manifestation scientifique :
2017
Discipline(s) HAL :
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
Most kernel-based methods, such as kernel or Gaussian process regression, kernel PCA, ICA, or $k$-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix $\mathbf{K}_n$ requires ...
Lire la suite >Most kernel-based methods, such as kernel or Gaussian process regression, kernel PCA, ICA, or $k$-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix $\mathbf{K}_n$ requires at least $\mathcal{O}(n^2)$ time and space for $n$ samples. Recent works show that sampling points with replacement according to their ridge leverage scores (RLS) generates small dictionaries of relevant points with strong spectral approximation guarantees for $\mathbf{K}_n$. The drawback of RLS-based methods is that computing exact RLS requires constructing and storing the whole kernel matrix. In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that sequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_{eff}(\gamma)$ of the dataset. Moreover since all the RLS estimations are efficiently performed using only the small dictionary, SQUEAK is the first RLS sampling algorithm that never constructs the whole matrix $\mathbf{K}_n$, runs in linear time $\widetilde{\mathcal{O}}(nd_{eff}(\gamma)^3)$ w.r.t. $n$, and requires only a single pass over the dataset. We also propose a parallel and distributed version of SQUEAK that linearly scales across multiple machines, achieving similar accuracy in as little as $\widetilde{\mathcal{O}}(\log(n)d_{eff}(\gamma)^3)$ time.Lire moins >
Lire la suite >Most kernel-based methods, such as kernel or Gaussian process regression, kernel PCA, ICA, or $k$-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix $\mathbf{K}_n$ requires at least $\mathcal{O}(n^2)$ time and space for $n$ samples. Recent works show that sampling points with replacement according to their ridge leverage scores (RLS) generates small dictionaries of relevant points with strong spectral approximation guarantees for $\mathbf{K}_n$. The drawback of RLS-based methods is that computing exact RLS requires constructing and storing the whole kernel matrix. In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that sequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_{eff}(\gamma)$ of the dataset. Moreover since all the RLS estimations are efficiently performed using only the small dictionary, SQUEAK is the first RLS sampling algorithm that never constructs the whole matrix $\mathbf{K}_n$, runs in linear time $\widetilde{\mathcal{O}}(nd_{eff}(\gamma)^3)$ w.r.t. $n$, and requires only a single pass over the dataset. We also propose a parallel and distributed version of SQUEAK that linearly scales across multiple machines, achieving similar accuracy in as little as $\widetilde{\mathcal{O}}(\log(n)d_{eff}(\gamma)^3)$ time.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Projet ANR :
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-01482760/document
- Accès libre
- Accéder au document
- http://arxiv.org/pdf/1803.10172
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01482760/document
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-01482760/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- calandriello2017distributed.pdf
- Accès libre
- Accéder au document
- 1803.10172
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- calandriello2017distributed.pdf
- Accès libre
- Accéder au document