Many-core Branch-and-Bound for GPU ...
Document type :
Partie d'ouvrage
Title :
Many-core Branch-and-Bound for GPU accelerators and MIC coprocessors
Author(s) :
Melab, Nouredine [Auteur]
Optimisation de grande taille et calcul large échelle [BONUS]
Gmys, Jan [Auteur]
Optimisation de grande taille et calcul large échelle [BONUS]
Université de Mons / University of Mons [UMONS]
Mezmaz, Mohand [Auteur]
Université de Mons / University of Mons [UMONS]
Tuyttens, Daniel [Auteur]
Université de Mons / University of Mons [UMONS]
Optimisation de grande taille et calcul large échelle [BONUS]
Gmys, Jan [Auteur]
Optimisation de grande taille et calcul large échelle [BONUS]
Université de Mons / University of Mons [UMONS]
Mezmaz, Mohand [Auteur]
Université de Mons / University of Mons [UMONS]
Tuyttens, Daniel [Auteur]
Université de Mons / University of Mons [UMONS]
Scientific editor(s) :
T. Bartz-Beielstein
B. Filipic
P. Korosec
E-G. Talbi
B. Filipic
P. Korosec
E-G. Talbi
Book title :
High-Performance Simulation-Based Optimization
Publisher :
Springer
Publication date :
2019-06-02
ISBN :
ISBN 978-3-030-18763-7
HAL domain(s) :
Informatique [cs]/Calcul parallèle, distribué et partagé [cs.DC]
Informatique [cs]/Algorithme et structure de données [cs.DS]
Computer Science [cs]/Operations Research [math.OC]
Informatique [cs]/Algorithme et structure de données [cs.DS]
Computer Science [cs]/Operations Research [math.OC]
English abstract : [en]
Coprocessors are increasingly becoming key building blocks of High Performance Computing platforms. These many-core energy-efficient devices boost the performance of traditional processors. On the other hand, Branch-and-Bound ...
Show more >Coprocessors are increasingly becoming key building blocks of High Performance Computing platforms. These many-core energy-efficient devices boost the performance of traditional processors. On the other hand, Branch-and-Bound (B&B) algorithms are tree-based exact methods for solving to optimality combinatorial optimization problems (COPs). Solving large COPs results in the generation of a very large pool of subproblems and the evaluation of their associated lower bounds. Generating and evaluating those subproblems on coprocessors raises several issues including processor-coprocessor data transfer optimization, vectorization, thread divergence, and so on. In this paper, we investigate the offload-based parallel design and implementation of B&B algorithms for coprocessors addressing these issues. Two major many-core architectures are considered and compared: Nvidia GPU and Intel MIC. The proposed approaches have been experimented using the Flow-Shop scheduling problem and two hardware configurations equivalent in terms of energy consumption: Nvidia Tesla K40 and Intel Xeon Phi 5110P. The reported results show that the GPU-accelerated approach outperforms the MIC offload-based one even in its vectorized version. Moreover, vectorization improves the efficiency of the MIC offload-based approach with a factor of two.Show less >
Show more >Coprocessors are increasingly becoming key building blocks of High Performance Computing platforms. These many-core energy-efficient devices boost the performance of traditional processors. On the other hand, Branch-and-Bound (B&B) algorithms are tree-based exact methods for solving to optimality combinatorial optimization problems (COPs). Solving large COPs results in the generation of a very large pool of subproblems and the evaluation of their associated lower bounds. Generating and evaluating those subproblems on coprocessors raises several issues including processor-coprocessor data transfer optimization, vectorization, thread divergence, and so on. In this paper, we investigate the offload-based parallel design and implementation of B&B algorithms for coprocessors addressing these issues. Two major many-core architectures are considered and compared: Nvidia GPU and Intel MIC. The proposed approaches have been experimented using the Flow-Shop scheduling problem and two hardware configurations equivalent in terms of energy consumption: Nvidia Tesla K40 and Intel Xeon Phi 5110P. The reported results show that the GPU-accelerated approach outperforms the MIC offload-based one even in its vectorized version. Moreover, vectorization improves the efficiency of the MIC offload-based approach with a factor of two.Show less >
Language :
Anglais
Audience :
Internationale
Popular science :
Non
Collections :
Source :
Files
- https://hal.inria.fr/hal-01924766/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01924766/document
- Open access
- Access the document
- https://hal.inria.fr/hal-01924766/document
- Open access
- Access the document
- document
- Open access
- Access the document
- 22-Melab-et-al.pdf
- Open access
- Access the document