Li, Baojiu and Schulz, Holger and Tuft, Adam and Weinzierl, Tobias and Zhang, Han (2023) 'Upscaling ExaHyPE – on each and every core.', Technical Report. ARCHER2.
We study a MPI+multithreaded PDE solver for hyperbolic partial differential equations. Each thread per rank handles a subdomain of the computational domain identified by a segment of a space-filling curve. The threads spawn additional tasks which should be used to compensate for ill-balancing between the threads running in fork-join mode. Our studies show that this tasks-over-BSP paradigm is not properly supported in some OpenMP runtimes, leads to NUMA pollution and is vulnerable to tiny tasks. It also suffers from many memory movements. Once we replace user data with smart pointers and hence avoid unnecessary copying, we propose to add a NUMA-aware queuing system on top of OpenMP, to batch multiple tasks into meta tasks which can spread out over idle cores. Many of these techniques are fixes to current OpenMP runtime implementations and we expect them to become unnecessary as the OpenMP runtimes evolve. The insights thus have pathfinding character.
|Item Type:||Monograph (Technical Report)|
|Full text:||(VoR) Version of Record|
Available under License - Creative Commons Attribution Non-commercial No Derivatives 4.0.
Download PDF (378Kb)
|Publisher Web site:||https://doi.org/10.5281/zenodo.7888492|
|Date accepted:||No date available|
|Date deposited:||05 May 2023|
|Date of first online publication:||02 May 2023|
|Date first made open access:||05 May 2023|
Save or Share this output
|Look up in GoogleScholar|