Achieving dynamic workflow management system by applying provenance based checkpointing method

E. Kail, P. Kacsuk, M. Kozlovszky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Scientific workflows are data and compute intensive thus may run for days or even for weeks on parallel and distributed infrastructures such as HPC systems and cloud. In HPC environment the number of failures that can arise during scientific workflow enactment can be high so the use of fault tolerance techniques is unavoidable. The most frequently used fault tolerance techniques are job replication and checkpointing. While job replication is based on the assumption that the probability of single failures is much higher than of simultaneous failures, the checkpointing saves certain states and the execution can be restarted from that point later on. The effectiveness of the checkpointing method depends on the checkpointing interval. Common technique is to dynamically adapt the checkpointing interval. In this work we give a brief overview of the different checkpointing techniques and propose a new provenance based dynamic checkpointing method.

Original languageEnglish
Title of host publication2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages250-253
Number of pages4
ISBN (Print)9789532330854
DOIs
Publication statusPublished - Jul 15 2015
Event38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Opatija, Croatia
Duration: May 25 2015May 29 2015

Other

Other38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015
CountryCroatia
CityOpatija
Period5/25/155/29/15

Fingerprint

Fault tolerance

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

Kail, E., Kacsuk, P., & Kozlovszky, M. (2015). Achieving dynamic workflow management system by applying provenance based checkpointing method. In 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Proceedings (pp. 250-253). [7160274] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/MIPRO.2015.7160274

Achieving dynamic workflow management system by applying provenance based checkpointing method. / Kail, E.; Kacsuk, P.; Kozlovszky, M.

2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2015. p. 250-253 7160274.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kail, E, Kacsuk, P & Kozlovszky, M 2015, Achieving dynamic workflow management system by applying provenance based checkpointing method. in 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Proceedings., 7160274, Institute of Electrical and Electronics Engineers Inc., pp. 250-253, 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015, Opatija, Croatia, 5/25/15. https://doi.org/10.1109/MIPRO.2015.7160274
Kail E, Kacsuk P, Kozlovszky M. Achieving dynamic workflow management system by applying provenance based checkpointing method. In 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2015. p. 250-253. 7160274 https://doi.org/10.1109/MIPRO.2015.7160274
Kail, E. ; Kacsuk, P. ; Kozlovszky, M. / Achieving dynamic workflow management system by applying provenance based checkpointing method. 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 250-253
@inproceedings{c3899ed001464f84942f32c639c0c14f,
title = "Achieving dynamic workflow management system by applying provenance based checkpointing method",
abstract = "Scientific workflows are data and compute intensive thus may run for days or even for weeks on parallel and distributed infrastructures such as HPC systems and cloud. In HPC environment the number of failures that can arise during scientific workflow enactment can be high so the use of fault tolerance techniques is unavoidable. The most frequently used fault tolerance techniques are job replication and checkpointing. While job replication is based on the assumption that the probability of single failures is much higher than of simultaneous failures, the checkpointing saves certain states and the execution can be restarted from that point later on. The effectiveness of the checkpointing method depends on the checkpointing interval. Common technique is to dynamically adapt the checkpointing interval. In this work we give a brief overview of the different checkpointing techniques and propose a new provenance based dynamic checkpointing method.",
author = "E. Kail and P. Kacsuk and M. Kozlovszky",
year = "2015",
month = "7",
day = "15",
doi = "10.1109/MIPRO.2015.7160274",
language = "English",
isbn = "9789532330854",
pages = "250--253",
booktitle = "2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Achieving dynamic workflow management system by applying provenance based checkpointing method

AU - Kail, E.

AU - Kacsuk, P.

AU - Kozlovszky, M.

PY - 2015/7/15

Y1 - 2015/7/15

N2 - Scientific workflows are data and compute intensive thus may run for days or even for weeks on parallel and distributed infrastructures such as HPC systems and cloud. In HPC environment the number of failures that can arise during scientific workflow enactment can be high so the use of fault tolerance techniques is unavoidable. The most frequently used fault tolerance techniques are job replication and checkpointing. While job replication is based on the assumption that the probability of single failures is much higher than of simultaneous failures, the checkpointing saves certain states and the execution can be restarted from that point later on. The effectiveness of the checkpointing method depends on the checkpointing interval. Common technique is to dynamically adapt the checkpointing interval. In this work we give a brief overview of the different checkpointing techniques and propose a new provenance based dynamic checkpointing method.

AB - Scientific workflows are data and compute intensive thus may run for days or even for weeks on parallel and distributed infrastructures such as HPC systems and cloud. In HPC environment the number of failures that can arise during scientific workflow enactment can be high so the use of fault tolerance techniques is unavoidable. The most frequently used fault tolerance techniques are job replication and checkpointing. While job replication is based on the assumption that the probability of single failures is much higher than of simultaneous failures, the checkpointing saves certain states and the execution can be restarted from that point later on. The effectiveness of the checkpointing method depends on the checkpointing interval. Common technique is to dynamically adapt the checkpointing interval. In this work we give a brief overview of the different checkpointing techniques and propose a new provenance based dynamic checkpointing method.

UR - http://www.scopus.com/inward/record.url?scp=84946115662&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946115662&partnerID=8YFLogxK

U2 - 10.1109/MIPRO.2015.7160274

DO - 10.1109/MIPRO.2015.7160274

M3 - Conference contribution

AN - SCOPUS:84946115662

SN - 9789532330854

SP - 250

EP - 253

BT - 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -