• English
    • français
  • Help
  •  | 
  • Contact
  •  | 
  • About
  •  | 
  • Login
  • HAL portal
  •  | 
  • Pages Pro
  • EN
  •  / 
  • FR
View Item 
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
  •   LillOA Home
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Trade-offs in Large-Scale Distributed ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Document type :
Communication dans un congrès avec actes
Title :
Trade-offs in Large-Scale Distributed Tuplewise Estimation and Learning
Author(s) :
Vogel, Robin [Auteur]
Laboratoire Traitement et Communication de l'Information [LTCI]
Bellet, Aurelien [Auteur] refId
Machine Learning in Information Networks [MAGNET]
Clémençon, Stéphan [Auteur]
Laboratoire Traitement et Communication de l'Information [LTCI]
Jelassi, Ons [Auteur]
Laboratoire Traitement et Communication de l'Information [LTCI]
Papa, Guillaume [Auteur]
Laboratoire Traitement et Communication de l'Information [LTCI]
Conference title :
ECML PKDD 2019 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
City :
Würzburg
Country :
Allemagne
Start date of the conference :
2019-09-16
English keyword(s) :
Distributed Machine Learning
Distributed Data Processing
U-Statistics
AUC Optimization
HAL domain(s) :
Informatique [cs]/Apprentissage [cs.LG]
Statistiques [stat]/Machine Learning [stat.ML]
English abstract : [en]
The development of cluster computing frameworks has allowed practitioners to scale out various statistical estimation and machine learning algorithms with minimal programming effort. This is especially true for machine ...
Show more >
The development of cluster computing frameworks has allowed practitioners to scale out various statistical estimation and machine learning algorithms with minimal programming effort. This is especially true for machine learning problems whose objective function is nicely separable across individual data points, such as classification and regression. In contrast, statistical learning tasks involving pairs (or more generally tuples) of data points-such as metric learning, clustering or ranking-do not lend themselves as easily to data-parallelism and in-memory computing. In this paper, we investigate how to balance between statistical performance and computational efficiency in such distributed tuplewise statistical problems. We first propose a simple strategy based on occasionally repartitioning data across workers between parallel computation stages, where the number of repartition-ing steps rules the trade-off between accuracy and runtime. We then present some theoretical results highlighting the benefits brought by the proposed method in terms of variance reduction, and extend our results to design distributed stochastic gradient descent algorithms for tuplewise empirical risk minimization. Our results are supported by numerical experiments in pairwise statistical estimation and learning on synthetic and real-world datasets.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Source :
Harvested from HAL
Files
Thumbnail
  • https://hal.inria.fr/hal-02166428/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-02166428/document
  • Open access
  • Access the document
Thumbnail
  • https://hal.inria.fr/hal-02166428/document
  • Open access
  • Access the document
Université de Lille

Mentions légales
Université de Lille © 2017