Application and middleware transparent checkpointing with TCKPT on clustergrid

J. Kovács, Rafal Mikolajczak, Radoslaw Januszewski, Gracjan Jankowski

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

This paper introduces a way to transform the existing parallel checkpointing techniques to be applied for software-heterogeneous ClusterGrid infrastructures. While existing solutions are aiming at providing application transparency by building special middleware, this paper aims at targeting both application and middleware transparency at the same time by inserting checkpoint functionality into the application. The compatibility and integrity requirements are identified and corresponding conditions are established. Some of the available checkpointing systems are checked against the conditions in order to examine their conformity. Based on the conditions, a novel checkpointing method is defined and the TotalCheckpoint tool is adapted for ClusterGrid.

Original languageEnglish
Title of host publicationDistributed and Parallel Systems: From Cluster to Grid Computing
PublisherSpringer US
Pages179-189
Number of pages11
ISBN (Print)0387698574, 9780387698571
DOIs
Publication statusPublished - 2007

Fingerprint

Middleware
Transparency

Keywords

  • Checkpoint
  • Cluster
  • Clustergrid
  • Grid
  • Migration
  • Parallel
  • Pvm
  • Recovery

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Kovács, J., Mikolajczak, R., Januszewski, R., & Jankowski, G. (2007). Application and middleware transparent checkpointing with TCKPT on clustergrid. In Distributed and Parallel Systems: From Cluster to Grid Computing (pp. 179-189). Springer US. https://doi.org/10.1007/978-0-387-69858-8_18

Application and middleware transparent checkpointing with TCKPT on clustergrid. / Kovács, J.; Mikolajczak, Rafal; Januszewski, Radoslaw; Jankowski, Gracjan.

Distributed and Parallel Systems: From Cluster to Grid Computing. Springer US, 2007. p. 179-189.

Research output: Chapter in Book/Report/Conference proceedingChapter

Kovács, J, Mikolajczak, R, Januszewski, R & Jankowski, G 2007, Application and middleware transparent checkpointing with TCKPT on clustergrid. in Distributed and Parallel Systems: From Cluster to Grid Computing. Springer US, pp. 179-189. https://doi.org/10.1007/978-0-387-69858-8_18
Kovács J, Mikolajczak R, Januszewski R, Jankowski G. Application and middleware transparent checkpointing with TCKPT on clustergrid. In Distributed and Parallel Systems: From Cluster to Grid Computing. Springer US. 2007. p. 179-189 https://doi.org/10.1007/978-0-387-69858-8_18
Kovács, J. ; Mikolajczak, Rafal ; Januszewski, Radoslaw ; Jankowski, Gracjan. / Application and middleware transparent checkpointing with TCKPT on clustergrid. Distributed and Parallel Systems: From Cluster to Grid Computing. Springer US, 2007. pp. 179-189
@inbook{6a42de223d774fbc80f17ee5bb4e2fe9,
title = "Application and middleware transparent checkpointing with TCKPT on clustergrid",
abstract = "This paper introduces a way to transform the existing parallel checkpointing techniques to be applied for software-heterogeneous ClusterGrid infrastructures. While existing solutions are aiming at providing application transparency by building special middleware, this paper aims at targeting both application and middleware transparency at the same time by inserting checkpoint functionality into the application. The compatibility and integrity requirements are identified and corresponding conditions are established. Some of the available checkpointing systems are checked against the conditions in order to examine their conformity. Based on the conditions, a novel checkpointing method is defined and the TotalCheckpoint tool is adapted for ClusterGrid.",
keywords = "Checkpoint, Cluster, Clustergrid, Grid, Migration, Parallel, Pvm, Recovery",
author = "J. Kov{\'a}cs and Rafal Mikolajczak and Radoslaw Januszewski and Gracjan Jankowski",
year = "2007",
doi = "10.1007/978-0-387-69858-8_18",
language = "English",
isbn = "0387698574",
pages = "179--189",
booktitle = "Distributed and Parallel Systems: From Cluster to Grid Computing",
publisher = "Springer US",

}

TY - CHAP

T1 - Application and middleware transparent checkpointing with TCKPT on clustergrid

AU - Kovács, J.

AU - Mikolajczak, Rafal

AU - Januszewski, Radoslaw

AU - Jankowski, Gracjan

PY - 2007

Y1 - 2007

N2 - This paper introduces a way to transform the existing parallel checkpointing techniques to be applied for software-heterogeneous ClusterGrid infrastructures. While existing solutions are aiming at providing application transparency by building special middleware, this paper aims at targeting both application and middleware transparency at the same time by inserting checkpoint functionality into the application. The compatibility and integrity requirements are identified and corresponding conditions are established. Some of the available checkpointing systems are checked against the conditions in order to examine their conformity. Based on the conditions, a novel checkpointing method is defined and the TotalCheckpoint tool is adapted for ClusterGrid.

AB - This paper introduces a way to transform the existing parallel checkpointing techniques to be applied for software-heterogeneous ClusterGrid infrastructures. While existing solutions are aiming at providing application transparency by building special middleware, this paper aims at targeting both application and middleware transparency at the same time by inserting checkpoint functionality into the application. The compatibility and integrity requirements are identified and corresponding conditions are established. Some of the available checkpointing systems are checked against the conditions in order to examine their conformity. Based on the conditions, a novel checkpointing method is defined and the TotalCheckpoint tool is adapted for ClusterGrid.

KW - Checkpoint

KW - Cluster

KW - Clustergrid

KW - Grid

KW - Migration

KW - Parallel

KW - Pvm

KW - Recovery

UR - http://www.scopus.com/inward/record.url?scp=84892791181&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84892791181&partnerID=8YFLogxK

U2 - 10.1007/978-0-387-69858-8_18

DO - 10.1007/978-0-387-69858-8_18

M3 - Chapter

AN - SCOPUS:84892791181

SN - 0387698574

SN - 9780387698571

SP - 179

EP - 189

BT - Distributed and Parallel Systems: From Cluster to Grid Computing

PB - Springer US

ER -