MadPipe: Memory Aware Dynamic Programming ...
Type de document :
Communication dans un congrès avec actes
Titre :
MadPipe: Memory Aware Dynamic Programming Algorithm for Pipelined Model Parallelism
Auteur(s) :
Beaumont, Olivier [Auteur]
High-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
Eyraud-Dubois, Lionel [Auteur]
High-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
Shilova, Alena [Auteur]
Scool [Scool]
High-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
High-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
Eyraud-Dubois, Lionel [Auteur]
High-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
Shilova, Alena [Auteur]
Scool [Scool]
High-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
Titre de la manifestation scientifique :
ScaDL 2022 - Scalable Deep Learning over Parallel and Distributed Infrastructure - An IPDPS 2022 Workshop
Ville :
Lyon / Virtual
Pays :
France
Date de début de la manifestation scientifique :
2022-06-03
Titre de la revue :
Proceedings of IPDPS W'22
Date de publication :
2022
Discipline(s) HAL :
Informatique [cs]/Calcul parallèle, distribué et partagé [cs.DC]
Résumé en anglais : [en]
The training phase in Deep Neural Networks (DNNs) is very computationally intensive and is nowadays often performed on parallel computing platforms, ranging from a few GPUs to several thousand GPUs. The strategy of choice ...
Lire la suite >The training phase in Deep Neural Networks (DNNs) is very computationally intensive and is nowadays often performed on parallel computing platforms, ranging from a few GPUs to several thousand GPUs. The strategy of choice for the parallelization of training is the so-called data parallel approach, based of the parallel training of the different inputs (typically images) and a the aggregation of network weights with collective communications (AllReduce). The scalability of this approach is limited both by the memory available on each node and the networking capacities for collective operations. Recently, a parallel model approach, in which the network weights are distributed and images are trained in a pipeline/stream manner over the computational nodes has been proposed (Pipedream, Gpipe). In this paper, we formalize in detail the optimization problem associated with the placement of DNN layers onto computation resources when using pipelined model parallelism, and we derive a dynamic programming based heuristic, MadPipe, that allows to significantly improve the performance of the parallel model approach compared to the literature.Lire moins >
Lire la suite >The training phase in Deep Neural Networks (DNNs) is very computationally intensive and is nowadays often performed on parallel computing platforms, ranging from a few GPUs to several thousand GPUs. The strategy of choice for the parallelization of training is the so-called data parallel approach, based of the parallel training of the different inputs (typically images) and a the aggregation of network weights with collective communications (AllReduce). The scalability of this approach is limited both by the memory available on each node and the networking capacities for collective operations. Recently, a parallel model approach, in which the network weights are distributed and images are trained in a pipeline/stream manner over the computational nodes has been proposed (Pipedream, Gpipe). In this paper, we formalize in detail the optimization problem associated with the placement of DNN layers onto computation resources when using pipelined model parallelism, and we derive a dynamic programming based heuristic, MadPipe, that allows to significantly improve the performance of the parallel model approach compared to the literature.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.archives-ouvertes.fr/hal-03025305/document
- Accès libre
- Accéder au document
- https://hal.archives-ouvertes.fr/hal-03025305/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- MadPipeRR.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- MadPipeRR.pdf
- Accès libre
- Accéder au document