Matthew Forshaw
On Energy-efficient Checkpointing in High-throughput Cycle-stealing Distributed Systems
Forshaw, Matthew; McGough, A. Stephen; Thomas, Nigel; Helfert, Markus; Krempels, Karl-Heinz; Donnellan, Brian
Authors
A. Stephen McGough
Nigel Thomas
Markus Helfert
Karl-Heinz Krempels
Brian Donnellan
Abstract
Checkpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC) environments to allow the execution of long-running computational tasks on compute resources subject to hardware and software failures and interruptions from resource owners. With increasing scrutiny of the energy consumption of IT infrastructures, it is important to understand the impact of checkpointing on the energy consumption of HTC environments. In this paper we demonstrate through trace-driven simulation on real-world datasets that existing checkpointing strategies are inadequate at maintaining an acceptable level of energy consumption whilst reducing the makespan of tasks. Furthermore, we identify factors important in deciding whether to employ checkpointing within an HTC environment, and propose novel strategies to curtail the energy consumption of checkpointing approaches.
Citation
Forshaw, M., McGough, A. S., Thomas, N., Helfert, M., Krempels, K., & Donnellan, B. (2014). On Energy-efficient Checkpointing in High-throughput Cycle-stealing Distributed Systems. In Proceedings of the 3rd International Conference on Smart Grids and Green IT Systems (SMARTGREENS 2014), 3-4 April 2014, Barcelona, Spain (262-272). https://doi.org/10.5220/0004958302620267
Conference Name | Proceedings of the 3rd International Conference on Smart Grids and Green IT Systems |
---|---|
Publication Date | Jan 1, 2014 |
Deposit Date | Jan 11, 2015 |
Publicly Available Date | Feb 3, 2015 |
Pages | 262-272 |
Book Title | Proceedings of the 3rd International Conference on Smart Grids and Green IT Systems (SMARTGREENS 2014), 3-4 April 2014, Barcelona, Spain. |
DOI | https://doi.org/10.5220/0004958302620267 |
Keywords | Energy efficiency, Checkpointing, Migration, Fault tolerance, Desktop grids. |
Files
Published Conference Proceeding
(333 Kb)
PDF
You might also like
Using Machine Learning in Trace-driven Energy-Aware Simulations of High-Throughput Computing Systems
(2017)
Conference Proceeding
Efficient Comparison of Massive Graphs Through The Use Of 'Graph Fingerprints'
(2016)
Conference Proceeding
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search