A fault-tolerant building block for transputer networks for real-time processing

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

A fault-tolerant building block for transputer networks for real-time processing Fei, Yueying

Abstract

Software Implementation of Multi-Processor Fault Tolerance for Real-Time processing is addressed in this thesis with the research focused on: • Fault-Tolerant cells as building blocks that can survive concurrent transient physical faults and permanent failures in large parallel processing systems with potential for real-time processing. • Efficient group communications for redundant data exchanges through multiple communication links that connect the group peers. • Transparent fault-tolerance. • On-Line Forward Fault-Repair using the live execution image from the non-faulty peer with a bounded delay. By systematically connecting the redundant processing modules, the architecture offers regularity and recursiveness which can be used as building blocks for construction of fault-tolerant parallel machines. The communication service protocols take advantage of redundant linkages to ensure reliable and efficient message deliveries among the fault-tolerant abstract transputer peer nodes through the concept of activity observation. The multiple redundant linkages provide a means for parallel communications. This is essential for redundant information exchanges in fault-tolerance. The activity observation concept further reduces the effort for reliable message delivery and simplifies the system design. As a result, messages are dynamically and optimally routed when link failure or processor failure occurs. Through the group communication mechanism underlying the platform, application processes on each FTAT peer node are transparent to details that they are replicated, repaired upon fault detections, and reintegrated after fault repair. Based on a dynamic Triple Modular Redundancy scheme, each application process can survive up to two concurrent faults under the assumption that the probability of two faulty peer applications having the same fault is very small. In a large interconnected network, the cost of fault-tolerance can be very expensive in terms of time and communication due to the cost of either synchronization or rollback recovery. The use of redundant live execution images to repair the faulty module guarantees forward fault recoveries.

Item Metadata

Title	A fault-tolerant building block for transputer networks for real-time processing
Creator	Fei, Yueying
Publisher	University of British Columbia
Date Issued	1993
Description	Software Implementation of Multi-Processor Fault Tolerance for Real-Time processing is addressed in this thesis with the research focused on: • Fault-Tolerant cells as building blocks that can survive concurrent transient physical faults and permanent failures in large parallel processing systems with potential for real-time processing. • Efficient group communications for redundant data exchanges through multiple communication links that connect the group peers. • Transparent fault-tolerance. • On-Line Forward Fault-Repair using the live execution image from the non-faulty peer with a bounded delay. By systematically connecting the redundant processing modules, the architecture offers regularity and recursiveness which can be used as building blocks for construction of fault-tolerant parallel machines. The communication service protocols take advantage of redundant linkages to ensure reliable and efficient message deliveries among the fault-tolerant abstract transputer peer nodes through the concept of activity observation. The multiple redundant linkages provide a means for parallel communications. This is essential for redundant information exchanges in fault-tolerance. The activity observation concept further reduces the effort for reliable message delivery and simplifies the system design. As a result, messages are dynamically and optimally routed when link failure or processor failure occurs. Through the group communication mechanism underlying the platform, application processes on each FTAT peer node are transparent to details that they are replicated, repaired upon fault detections, and reintegrated after fault repair. Based on a dynamic Triple Modular Redundancy scheme, each application process can survive up to two concurrent faults under the assumption that the probability of two faulty peer applications having the same fault is very small. In a large interconnected network, the cost of fault-tolerance can be very expensive in terms of time and communication due to the cost of either synchronization or rollback recovery. The use of redundant live execution images to repair the faulty module guarantees forward fault recoveries.
Extent	3990953 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2008-08-12
Provider	Vancouver : University of British Columbia Library
Rights	For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
DOI	10.14288/1.0065183
URI	http://hdl.handle.net/2429/1359
Degree (Theses)	Master of Applied Science - MASc
Program (Theses)	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	1993-05
Campus	UBCV
Scholarly Level	Graduate
Aggregated Source Repository	DSpace

Item Media

ubc_1993_spring_fei_yueying.pdf -- 3.81MB

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.

Open Collections

UBC Theses and Dissertations

A fault-tolerant building block for transputer networks for real-time processing Fei, Yueying

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights