GPU-Accelerated Tree-Search in Chapel ...
Type de document :
Communication dans un congrès avec actes
Titre :
GPU-Accelerated Tree-Search in Chapel versus CUDA and HIP
Auteur(s) :
Helbecque, Guillaume [Auteur]
Université du Luxembourg = University of Luxembourg = Universität Luxemburg [uni.lu]
Université de Lille
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Inria Lille - Nord Europe
Optimisation de grande taille et calcul large échelle [BONUS]
Krishnasamy, Ezhilmathi [Auteur]
Université du Luxembourg = University of Luxembourg = Universität Luxemburg [uni.lu]
Melab, Nouredine [Auteur]
Université de Lille
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Inria Lille - Nord Europe
Optimisation de grande taille et calcul large échelle [BONUS]
Bouvry, Pascal [Auteur]
Université du Luxembourg = University of Luxembourg = Universität Luxemburg [uni.lu]
Université du Luxembourg = University of Luxembourg = Universität Luxemburg [uni.lu]
Université de Lille
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Inria Lille - Nord Europe
Optimisation de grande taille et calcul large échelle [BONUS]
Krishnasamy, Ezhilmathi [Auteur]
Université du Luxembourg = University of Luxembourg = Universität Luxemburg [uni.lu]
Melab, Nouredine [Auteur]
Université de Lille
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Inria Lille - Nord Europe
Optimisation de grande taille et calcul large échelle [BONUS]
Bouvry, Pascal [Auteur]
Université du Luxembourg = University of Luxembourg = Universität Luxemburg [uni.lu]
Titre de la manifestation scientifique :
14th IEEE Workshop Parallel / Distributed Combinatorics and Optimization (PDCO 2024)
Ville :
San Francisco
Pays :
Etats-Unis d'Amérique
Date de début de la manifestation scientifique :
2024-05-31
Mot(s)-clé(s) en anglais :
Chapel
Tree-Search
GPU computing
CUDA
HIP
N-Queens
Nvidia
AMD
Tree-Search
GPU computing
CUDA
HIP
N-Queens
Nvidia
AMD
Discipline(s) HAL :
Informatique [cs]/Calcul parallèle, distribué et partagé [cs.DC]
Résumé en anglais : [en]
In the context of exascale programming, the PGAS-based Chapel is among the rare languages targeting the holistic handling of high-performance computing issues including the productivity-aware harnessing of Nvidia and AMD ...
Lire la suite >In the context of exascale programming, the PGAS-based Chapel is among the rare languages targeting the holistic handling of high-performance computing issues including the productivity-aware harnessing of Nvidia and AMD GPUs. In this paper, we propose a pioneering proof-of-concept dealing with this latter issue in the context of tree-based exact optimization. Actually, we revisit the design and implementation of a generic multi-pool GPU-accelerated tree-search algorithm using Chapel. This algorithm is instantiated on the backtracking method and experimented on the N-Queens problem. For performance evaluation, the Chapel-based approach is compared to Nvidia CUDA and AMD HIP low-level counterparts. The reported results show that in a single-GPU setting, the high GPU abstraction of Chapel results in a loss of only 8% (resp. 16%) compared to CUDA (resp. HIP). In a multi-GPU setting, up to 80% (resp. 71%) of the baseline speed-up is achieved for coarse-grained problem instances on Nvidia (resp. AMD) GPUs.Lire moins >
Lire la suite >In the context of exascale programming, the PGAS-based Chapel is among the rare languages targeting the holistic handling of high-performance computing issues including the productivity-aware harnessing of Nvidia and AMD GPUs. In this paper, we propose a pioneering proof-of-concept dealing with this latter issue in the context of tree-based exact optimization. Actually, we revisit the design and implementation of a generic multi-pool GPU-accelerated tree-search algorithm using Chapel. This algorithm is instantiated on the backtracking method and experimented on the N-Queens problem. For performance evaluation, the Chapel-based approach is compared to Nvidia CUDA and AMD HIP low-level counterparts. The reported results show that in a single-GPU setting, the high GPU abstraction of Chapel results in a loss of only 8% (resp. 16%) compared to CUDA (resp. HIP). In a multi-GPU setting, up to 80% (resp. 71%) of the baseline speed-up is achieved for coarse-grained problem instances on Nvidia (resp. AMD) GPUs.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- document
- Accès libre
- Accéder au document
- PDCO2024_Helbecque_et_al_preprint.pdf
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- PDCO2024_Helbecque_et_al_preprint.pdf
- Accès libre
- Accéder au document