Method for the construction and interpretation of high level models for distributed fault-tolerant systems

K. Tilly, I. Kiss, G. Roman, T. Dobrowiecki, A. Várkonyi-Kóczy

Research output: Conference contribution

Abstract

Traditional solutions for achieving fault-tolerance are intended for use at design time and they generally capture system information at a very low (hardware or machine instruction) level. Increasing reliability of complex information systems containing many (perhaps many thousands) of autonomous components requires different solutions. This article presents a new methodology for the implementation of large scale, distributed fault-tolerant systems. System models are formed of objects describing requirements, services and resources organized into high level top-down hierarchical decomposition structures. Since redundancy is a natural property of any large scale system, using such models it is possible to achieve fault tolerant behaviour by finding multiple appropriate mappings between requirements and available services, and to support the required services by available resources. The distributed system is extended with dedicated components, called diagnostic centres, which manage distinct parts of the system model, continuously observe the operation of the distributed system, and find alternative requirement-service mappings, if some services fail to fulfil their associated requirements. In the following sections the elements and the structure of the proposed system modelling method is presented, an appropriate fault model is defined, and the algorithms for model interpretation are described.

Original languageEnglish
Title of host publicationProceedings of the IEEE Symposium on Reliable Distributed Systems
PublisherIEEE
Pages72-81
Number of pages10
Publication statusPublished - 1995
EventProceedings of the 1994 IEEE 14th Symposium on Related Distributed Systems - Bad Neuenahr, Ger
Duration: szept. 13 1995szept. 15 1995

Other

OtherProceedings of the 1994 IEEE 14th Symposium on Related Distributed Systems
CityBad Neuenahr, Ger
Period9/13/959/15/95

Fingerprint

Information systems
Fault tolerance
Computer hardware
Redundancy
Large scale systems
Decomposition

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Tilly, K., Kiss, I., Roman, G., Dobrowiecki, T., & Várkonyi-Kóczy, A. (1995). Method for the construction and interpretation of high level models for distributed fault-tolerant systems. In Proceedings of the IEEE Symposium on Reliable Distributed Systems (pp. 72-81). IEEE.

Method for the construction and interpretation of high level models for distributed fault-tolerant systems. / Tilly, K.; Kiss, I.; Roman, G.; Dobrowiecki, T.; Várkonyi-Kóczy, A.

Proceedings of the IEEE Symposium on Reliable Distributed Systems. IEEE, 1995. p. 72-81.

Research output: Conference contribution

Tilly, K, Kiss, I, Roman, G, Dobrowiecki, T & Várkonyi-Kóczy, A 1995, Method for the construction and interpretation of high level models for distributed fault-tolerant systems. in Proceedings of the IEEE Symposium on Reliable Distributed Systems. IEEE, pp. 72-81, Proceedings of the 1994 IEEE 14th Symposium on Related Distributed Systems, Bad Neuenahr, Ger, 9/13/95.
Tilly K, Kiss I, Roman G, Dobrowiecki T, Várkonyi-Kóczy A. Method for the construction and interpretation of high level models for distributed fault-tolerant systems. In Proceedings of the IEEE Symposium on Reliable Distributed Systems. IEEE. 1995. p. 72-81
Tilly, K. ; Kiss, I. ; Roman, G. ; Dobrowiecki, T. ; Várkonyi-Kóczy, A. / Method for the construction and interpretation of high level models for distributed fault-tolerant systems. Proceedings of the IEEE Symposium on Reliable Distributed Systems. IEEE, 1995. pp. 72-81
@inproceedings{5ff5cee0ba0f4f0d9e827880d4304413,
title = "Method for the construction and interpretation of high level models for distributed fault-tolerant systems",
abstract = "Traditional solutions for achieving fault-tolerance are intended for use at design time and they generally capture system information at a very low (hardware or machine instruction) level. Increasing reliability of complex information systems containing many (perhaps many thousands) of autonomous components requires different solutions. This article presents a new methodology for the implementation of large scale, distributed fault-tolerant systems. System models are formed of objects describing requirements, services and resources organized into high level top-down hierarchical decomposition structures. Since redundancy is a natural property of any large scale system, using such models it is possible to achieve fault tolerant behaviour by finding multiple appropriate mappings between requirements and available services, and to support the required services by available resources. The distributed system is extended with dedicated components, called diagnostic centres, which manage distinct parts of the system model, continuously observe the operation of the distributed system, and find alternative requirement-service mappings, if some services fail to fulfil their associated requirements. In the following sections the elements and the structure of the proposed system modelling method is presented, an appropriate fault model is defined, and the algorithms for model interpretation are described.",
author = "K. Tilly and I. Kiss and G. Roman and T. Dobrowiecki and A. V{\'a}rkonyi-K{\'o}czy",
year = "1995",
language = "English",
pages = "72--81",
booktitle = "Proceedings of the IEEE Symposium on Reliable Distributed Systems",
publisher = "IEEE",

}

TY - GEN

T1 - Method for the construction and interpretation of high level models for distributed fault-tolerant systems

AU - Tilly, K.

AU - Kiss, I.

AU - Roman, G.

AU - Dobrowiecki, T.

AU - Várkonyi-Kóczy, A.

PY - 1995

Y1 - 1995

N2 - Traditional solutions for achieving fault-tolerance are intended for use at design time and they generally capture system information at a very low (hardware or machine instruction) level. Increasing reliability of complex information systems containing many (perhaps many thousands) of autonomous components requires different solutions. This article presents a new methodology for the implementation of large scale, distributed fault-tolerant systems. System models are formed of objects describing requirements, services and resources organized into high level top-down hierarchical decomposition structures. Since redundancy is a natural property of any large scale system, using such models it is possible to achieve fault tolerant behaviour by finding multiple appropriate mappings between requirements and available services, and to support the required services by available resources. The distributed system is extended with dedicated components, called diagnostic centres, which manage distinct parts of the system model, continuously observe the operation of the distributed system, and find alternative requirement-service mappings, if some services fail to fulfil their associated requirements. In the following sections the elements and the structure of the proposed system modelling method is presented, an appropriate fault model is defined, and the algorithms for model interpretation are described.

AB - Traditional solutions for achieving fault-tolerance are intended for use at design time and they generally capture system information at a very low (hardware or machine instruction) level. Increasing reliability of complex information systems containing many (perhaps many thousands) of autonomous components requires different solutions. This article presents a new methodology for the implementation of large scale, distributed fault-tolerant systems. System models are formed of objects describing requirements, services and resources organized into high level top-down hierarchical decomposition structures. Since redundancy is a natural property of any large scale system, using such models it is possible to achieve fault tolerant behaviour by finding multiple appropriate mappings between requirements and available services, and to support the required services by available resources. The distributed system is extended with dedicated components, called diagnostic centres, which manage distinct parts of the system model, continuously observe the operation of the distributed system, and find alternative requirement-service mappings, if some services fail to fulfil their associated requirements. In the following sections the elements and the structure of the proposed system modelling method is presented, an appropriate fault model is defined, and the algorithms for model interpretation are described.

UR - http://www.scopus.com/inward/record.url?scp=0029203357&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029203357&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0029203357

SP - 72

EP - 81

BT - Proceedings of the IEEE Symposium on Reliable Distributed Systems

PB - IEEE

ER -