Checkpointing of parallel applications in a grid environment

Kreeteeraj Sajadah, Gabor Terstyansky, Stephen C. Winter, Peter Kacsuk

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Jobs in Grid workflows are exposed to different types of failure. It is important to develop fault tolerant mechanisms to ensure a good level of reliability during the execution of Grid jobs. While checkpointing is the most common method to achieve fault tolerance, there is still a lot of work to be done to improve the efficiency of the mechanism. This paper gives an overview of a checkpoint solution for checkpointing parallel applications executed on multiple sites in the Grid environment. The checkpointing mechanism is an improvement of the PGRADE checkpointing solution.

Original languageEnglish
Title of host publicationDistributed and Parallel Systems
Subtitle of host publicationIn Focus: Desktop Grid Computing
PublisherSpringer US
Pages179-187
Number of pages9
ISBN (Print)9780387698571
DOIs
Publication statusPublished - Dec 1 2007

Keywords

  • Checkpointing
  • Critical Region
  • First Order Approximation
  • Natural Synchronisation Points

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'Checkpointing of parallel applications in a grid environment'. Together they form a unique fingerprint.

  • Cite this

    Sajadah, K., Terstyansky, G., Winter, S. C., & Kacsuk, P. (2007). Checkpointing of parallel applications in a grid environment. In Distributed and Parallel Systems: In Focus: Desktop Grid Computing (pp. 179-187). Springer US. https://doi.org/10.1007/978-0-387-79448-8_16