MadPipe: Memory Aware Dynamic Programming ...
Document type :
Communication dans un congrès avec actes
Title :
MadPipe: Memory Aware Dynamic Programming Algorithm for Pipelined Model Parallelism
Author(s) :
Beaumont, Olivier [Auteur]
High-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
Eyraud-Dubois, Lionel [Auteur]
High-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
Shilova, Alena [Auteur]
Scool [Scool]
High-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
High-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
Eyraud-Dubois, Lionel [Auteur]
High-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
Shilova, Alena [Auteur]
Scool [Scool]
High-End Parallel Algorithms for Challenging Numerical Simulations [HiePACS]
Conference title :
ScaDL 2022 - Scalable Deep Learning over Parallel and Distributed Infrastructure - An IPDPS 2022 Workshop
City :
Lyon / Virtual
Country :
France
Start date of the conference :
2022-06-03
Journal title :
Proceedings of IPDPS W'22
Publication date :
2022
HAL domain(s) :
Informatique [cs]/Calcul parallèle, distribué et partagé [cs.DC]
English abstract : [en]
The training phase in Deep Neural Networks (DNNs) is very computationally intensive and is nowadays often performed on parallel computing platforms, ranging from a few GPUs to several thousand GPUs. The strategy of choice ...
Show more >The training phase in Deep Neural Networks (DNNs) is very computationally intensive and is nowadays often performed on parallel computing platforms, ranging from a few GPUs to several thousand GPUs. The strategy of choice for the parallelization of training is the so-called data parallel approach, based of the parallel training of the different inputs (typically images) and a the aggregation of network weights with collective communications (AllReduce). The scalability of this approach is limited both by the memory available on each node and the networking capacities for collective operations. Recently, a parallel model approach, in which the network weights are distributed and images are trained in a pipeline/stream manner over the computational nodes has been proposed (Pipedream, Gpipe). In this paper, we formalize in detail the optimization problem associated with the placement of DNN layers onto computation resources when using pipelined model parallelism, and we derive a dynamic programming based heuristic, MadPipe, that allows to significantly improve the performance of the parallel model approach compared to the literature.Show less >
Show more >The training phase in Deep Neural Networks (DNNs) is very computationally intensive and is nowadays often performed on parallel computing platforms, ranging from a few GPUs to several thousand GPUs. The strategy of choice for the parallelization of training is the so-called data parallel approach, based of the parallel training of the different inputs (typically images) and a the aggregation of network weights with collective communications (AllReduce). The scalability of this approach is limited both by the memory available on each node and the networking capacities for collective operations. Recently, a parallel model approach, in which the network weights are distributed and images are trained in a pipeline/stream manner over the computational nodes has been proposed (Pipedream, Gpipe). In this paper, we formalize in detail the optimization problem associated with the placement of DNN layers onto computation resources when using pipelined model parallelism, and we derive a dynamic programming based heuristic, MadPipe, that allows to significantly improve the performance of the parallel model approach compared to the literature.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.archives-ouvertes.fr/hal-03025305/document
- Open access
- Access the document
- https://hal.archives-ouvertes.fr/hal-03025305/document
- Open access
- Access the document
- document
- Open access
- Access the document
- MadPipeRR.pdf
- Open access
- Access the document