Dual tree traversal on integrated GPUs for ...
Document type :
Article dans une revue scientifique
DOI :
Title :
Dual tree traversal on integrated GPUs for astrophysical N-body simulations
Author(s) :
Fortin, Pierre [Auteur]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Performance et Qualité des Algorithmes Numériques [PEQUAN]
Touche, Maxime [Auteur]
Performance et Qualité des Algorithmes Numériques [PEQUAN]
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Performance et Qualité des Algorithmes Numériques [PEQUAN]
Touche, Maxime [Auteur]
Performance et Qualité des Algorithmes Numériques [PEQUAN]
Journal title :
International Journal of High Performance Computing Applications
Pages :
960-972
Publisher :
SAGE Publications
Publication date :
2019-09-01
ISSN :
1094-3420
English keyword(s) :
dual tree traversal
integrated GPU
hybrid CPU-GPU algorithm
fast multipole method
astrophysics
integrated GPU
hybrid CPU-GPU algorithm
fast multipole method
astrophysics
HAL domain(s) :
Sciences cognitives/Informatique
Informatique [cs]/Calcul parallèle, distribué et partagé [cs.DC]
Informatique [cs]/Calcul parallèle, distribué et partagé [cs.DC]
English abstract : [en]
In astrophysical N-body simulations, O(N) fast multipole methods (FMMs) with dual tree traversal (DTT) on multi-core CPUs are faster than O(N log N) CPU tree-codes but can still be outperformed by GPU ones. In this paper, ...
Show more >In astrophysical N-body simulations, O(N) fast multipole methods (FMMs) with dual tree traversal (DTT) on multi-core CPUs are faster than O(N log N) CPU tree-codes but can still be outperformed by GPU ones. In this paper, we aim at combining the best algorithm , namely FMM with DTT, with the most powerful hardware currently available, namely GPUs. In the astrophysical context requiring low accuracies and non-uniform particle distributions, we show that such combination can be achieved thanks to an hybrid CPU-GPU algorithm on integrated GPUs: while the DTT is performed on the CPU cores, the far-and near-field computations are all performed on the GPU cores. We show how to efficiently expose the interactions resulting from the DTT to the GPU cores, how to deploy both the far-and near-field computations on GPU and how to overlap the parallel DTT on CPU with GPU computations. Based on the falcON code and using OpenCL on AMD Accelerated Processing Units and on Intel integrated GPUs, this first heterogeneous deployment of DTT for FMM outperforms standard multi-core CPUs, and matches GPU and high-end CPU performance, being hence more cost-and power-efficient.Show less >
Show more >In astrophysical N-body simulations, O(N) fast multipole methods (FMMs) with dual tree traversal (DTT) on multi-core CPUs are faster than O(N log N) CPU tree-codes but can still be outperformed by GPU ones. In this paper, we aim at combining the best algorithm , namely FMM with DTT, with the most powerful hardware currently available, namely GPUs. In the astrophysical context requiring low accuracies and non-uniform particle distributions, we show that such combination can be achieved thanks to an hybrid CPU-GPU algorithm on integrated GPUs: while the DTT is performed on the CPU cores, the far-and near-field computations are all performed on the GPU cores. We show how to efficiently expose the interactions resulting from the DTT to the GPU cores, how to deploy both the far-and near-field computations on GPU and how to overlap the parallel DTT on CPU with GPU computations. Based on the falcON code and using OpenCL on AMD Accelerated Processing Units and on Intel integrated GPUs, this first heterogeneous deployment of DTT for FMM outperforms standard multi-core CPUs, and matches GPU and high-end CPU performance, being hence more cost-and power-efficient.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.sorbonne-universite.fr/hal-02073710/document
- Open access
- Access the document
- https://hal.sorbonne-universite.fr/hal-02073710/document
- Open access
- Access the document