Skip to main content

Research Repository

Advanced Search

Lightweight Task Offloading Exploiting MPI Wait Times for Parallel Adaptive Mesh Refinement

Samfass, Philipp; Weinzierl, Tobias; Charrier, Dominic E.; Bader, Michael

Lightweight Task Offloading Exploiting MPI Wait Times for Parallel Adaptive Mesh Refinement Thumbnail


Authors

Philipp Samfass

Dominic E. Charrier

Michael Bader



Abstract

Balancing the workload of sophisticated simulations is inherently difficult, since we have to balance both computational workload and memory footprint over meshes that can change any time or yield unpredictable cost per mesh entity, while modern supercomputers and their interconnects start to exhibit fluctuating performance. We propose a novel lightweight balancing technique for MPI+X to accompany traditional, prediction‐based load balancing. It is a reactive diffusion approach that uses online measurements of MPI idle time to migrate tasks temporarily from overloaded to underemployed ranks. Tasks are deployed to ranks which otherwise would wait, processed with high priority, and made available to the overloaded ranks again. This migration is nonpersistent. Our approach hijacks idle time to do meaningful work and is totally nonblocking, asynchronous and distributed without a global data view. Tests with a seismic simulation code developed in the ExaHyPE engine uncover the method's potential. We found speed‐ups of up to 2‐3 for ill‐balanced scenarios without logical modifications of the code base and show that the strategy is capable to react quickly to temporarily changing workload or node performance.

Citation

Samfass, P., Weinzierl, T., Charrier, D. E., & Bader, M. (2020). Lightweight Task Offloading Exploiting MPI Wait Times for Parallel Adaptive Mesh Refinement. Concurrency and Computation: Practice and Experience, 32(24), Article e5916. https://doi.org/10.1002/cpe.5916

Journal Article Type Article
Acceptance Date May 18, 2020
Online Publication Date Jul 9, 2020
Publication Date Dec 25, 2020
Deposit Date May 18, 2020
Publicly Available Date Jul 17, 2020
Journal Concurrency and Computation: Practice and Experience
Print ISSN 1532-0626
Publisher Wiley
Peer Reviewed Peer Reviewed
Volume 32
Issue 24
Article Number e5916
DOI https://doi.org/10.1002/cpe.5916
Related Public URLs https://arxiv.org/abs/1909.06096

Files

Published Journal Article (Advance online version) (2.3 Mb)
PDF

Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/

Copyright Statement
Advance online version © 2020 The Authors. Concurrency and Computation: Practice and Experience published by John Wiley & Sons, Ltd. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.






You might also like



Downloadable Citations