Charrier, Dominic E. and Weinzierl, Tobias (2018) 'An experience report on (auto-)tuning of mesh-based PDE solvers on shared memory systems.', in Parallel processing and applied mathematics : 12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017 ; revised selected papers. Part I. Cham: Springer, pp. 3-13. Lecture notes in computer science. (10777).
With the advent of manycore systems, shared memory parallelisation has gained importance in high performance computing. Once a code is decomposed into tasks or parallel regions, it becomes crucial to identify reasonable grain sizes, i.e. minimum problem sizes per task that make the algorithm expose a high concurrency at low overhead. Many papers do not detail what reasonable task sizes are, and consider their findings craftsmanship not worth discussion. We have implemented an autotuning algorithm, a machine learning approach, for a project developing a hyperbolic equation system solver. Autotuning here is important as the grid and task workload are multifaceted and change frequently during runtime. In this paper, we summarise our lessons learned. We infer tweaks and idioms for general autotuning algorithms and we clarify that such a approach does not free users completely from grain size awareness.
|Item Type:||Book chapter|
|Keywords:||Autotuning, Shared memory, Grain size, Machine learning.|
|Full text:||(AM) Accepted Manuscript|
Download PDF (646Kb)
|Publisher Web site:||https://doi.org/10.1007/978-3-319-78054-2_1|
|Publisher statement:||The final publication is available at Springer via https://doi.org/10.1007/978-3-319-78054-2_1|
|Date accepted:||21 June 2017|
|Date deposited:||22 June 2017|
|Date of first online publication:||23 March 2018|
|Date first made open access:||23 March 2019|
Save or Share this output
|Look up in GoogleScholar|