Slightly slowed applications, major energy savings
Document type :
Autre communication scientifique (congrès sans actes - poster - séminaire...): Communication dans un congrès avec actes
Title :
Playing with power at runtime
Slightly slowed applications, major energy savings
Slightly slowed applications, major energy savings
Author(s) :
Bleuse, Raphaël [Auteur]
Université Grenoble Alpes [UGA]
Control for Autonomic computing systems [CTRL-A]
Cerf, Sophie [Auteur]
Self-adaptation for distributed services and large software systems [SPIRALS]
Rutten, Éric [Auteur]
Control for Autonomic computing systems [CTRL-A]
Université Grenoble Alpes [UGA]
Control for Autonomic computing systems [CTRL-A]
Cerf, Sophie [Auteur]
Self-adaptation for distributed services and large software systems [SPIRALS]
Rutten, Éric [Auteur]
Control for Autonomic computing systems [CTRL-A]
Conference title :
JCAD 2022 - Journées calcul et données
City :
Dijon
Country :
France
Start date of the conference :
2022-10-10
English keyword(s) :
HPC
control theory
digital soberness
control theory
digital soberness
HAL domain(s) :
Informatique [cs]/Calcul parallèle, distribué et partagé [cs.DC]
Informatique [cs]/Automatique
Informatique [cs]/Automatique
English abstract : [en]
Soberness—in terms of electrical power—of Data Centers and high-performance computing (HPC) systems is becoming an important design issue, as the global energy consumption of Information Technologies (IT) is rising at ...
Show more >Soberness—in terms of electrical power—of Data Centers and high-performance computing (HPC) systems is becoming an important design issue, as the global energy consumption of Information Technologies (IT) is rising at considerable levels. This question is all the more complex as these systems are increasingly heterogeneous and variable in their behavior with respect to their performance and power consumption. As applications struggle to make use of increasingly heterogeneous compute nodes, maintaining high efficiency (performance per watt) for the whole platform becomes a challenge. Additionally, applications tend to present phases (I/O, computing- or memory-intensive, check-pointing) which vary over time, and to be executed on an environment subject to external constraints (e.g., concurrency or energy envelop).This increasing complexity makes HPC less predictable offline (prior to the execution). Therefore, dealing with time variations and unpredictable disturbances demands runtime management. In this work, we realize dynamical adaptation using feedback control, falling into the scope of autonomic computing, using control theory. Particularly, we address the problem of the control of the power allocated to processors, and hence their energy consumption and performance. The use of feedback control allows to reduce the energy consumption by decreasing the speed with limited and configurable performance loss, by exploiting periods where read/write operations slow down the progress. The proposed controller has an easily configured behavior: the user has to supply only an acceptable degradation level. An HPC application such as our system undergoes many variations of its behavior, depending on (i) the cluster, (ii) the node, (iii) the run, and even (iv) during the runtime.We evaluate our approach on top of an existing resource management framework, the Argo Node Resource Manager, deployed on several clusters of Grid'5000, using a standard memory-bound HPC benchmark. Our results show the existence of a family of trade-offs to save energy, depending on the allowed degradation (from 0 to 20%). In particular, our control approach allows, on average, saving 22% energy at the cost of a 7% execution time, and climbs up to 25% energy savings with the adaptation. Our solution has shown to be robust to variations of the machines (from one node to another) and of the runs (from one execution of the application to another).The experiments conducted in this work require to instrument low-level software stacks. Conducting this work on top of Grid'5000 was key as it allowed us to study various hardware setups (varying number of sockets, varying amount of memory) and their impact on the controller. The presence of clusters composed of homogeneous hardware allowed us to study the robustness of the devised control with respect to the variability in hardware performance despite identical specifications. Finally, our work relied on power measures as provided by the integrated sensors: we could extend this work by exploiting the available power sensors.Our future works will tackle three remaining challenges: (i) handling various types of phases and their chaining in a application, (ii) distributed execution (different powercap enforced on each processor or core) and (iii) non-instrumented applications (for which an instrumentation is not possible).Show less >
Show more >Soberness—in terms of electrical power—of Data Centers and high-performance computing (HPC) systems is becoming an important design issue, as the global energy consumption of Information Technologies (IT) is rising at considerable levels. This question is all the more complex as these systems are increasingly heterogeneous and variable in their behavior with respect to their performance and power consumption. As applications struggle to make use of increasingly heterogeneous compute nodes, maintaining high efficiency (performance per watt) for the whole platform becomes a challenge. Additionally, applications tend to present phases (I/O, computing- or memory-intensive, check-pointing) which vary over time, and to be executed on an environment subject to external constraints (e.g., concurrency or energy envelop).This increasing complexity makes HPC less predictable offline (prior to the execution). Therefore, dealing with time variations and unpredictable disturbances demands runtime management. In this work, we realize dynamical adaptation using feedback control, falling into the scope of autonomic computing, using control theory. Particularly, we address the problem of the control of the power allocated to processors, and hence their energy consumption and performance. The use of feedback control allows to reduce the energy consumption by decreasing the speed with limited and configurable performance loss, by exploiting periods where read/write operations slow down the progress. The proposed controller has an easily configured behavior: the user has to supply only an acceptable degradation level. An HPC application such as our system undergoes many variations of its behavior, depending on (i) the cluster, (ii) the node, (iii) the run, and even (iv) during the runtime.We evaluate our approach on top of an existing resource management framework, the Argo Node Resource Manager, deployed on several clusters of Grid'5000, using a standard memory-bound HPC benchmark. Our results show the existence of a family of trade-offs to save energy, depending on the allowed degradation (from 0 to 20%). In particular, our control approach allows, on average, saving 22% energy at the cost of a 7% execution time, and climbs up to 25% energy savings with the adaptation. Our solution has shown to be robust to variations of the machines (from one node to another) and of the runs (from one execution of the application to another).The experiments conducted in this work require to instrument low-level software stacks. Conducting this work on top of Grid'5000 was key as it allowed us to study various hardware setups (varying number of sockets, varying amount of memory) and their impact on the controller. The presence of clusters composed of homogeneous hardware allowed us to study the robustness of the devised control with respect to the variability in hardware performance despite identical specifications. Finally, our work relied on power measures as provided by the integrated sensors: we could extend this work by exploiting the available power sensors.Our future works will tackle three remaining challenges: (i) handling various types of phases and their chaining in a application, (ii) distributed execution (different powercap enforced on each processor or core) and (iii) non-instrumented applications (for which an instrumentation is not possible).Show less >
Language :
Anglais
Peer reviewed article :
Non
Audience :
Nationale
Popular science :
Non
Collections :
Source :
Files
- document
- Open access
- Access the document
- 034_Presentation_longue_Orateur_Bleuse_Raphael_Playing_with_power_at_runtime.PDF
- Open access
- Access the document
- document
- Open access
- Access the document
- 034_Presentation_longue_Orateur_Bleuse_Raphael_Playing_with_power_at_runtime.PDF
- Open access
- Access the document