Dynamic Fault-Tolerance Management in Failure-Prone and Battery-Powered Systems

Phillip Stanley-Marbell and Diana Marculescu.

In Proceedings of the International Workshop on Logic and Synthesis, IWLS '03, May 2003.



ABSTRACT
Emerging VLSI technologies, as well as emerging platforms, are giving rise to systems with inherently high potential for runtime failure. Such failures range from intermittent electrical and mechanical failures at the system level, to device failures at the chip level. Techniques to provide reliable computation in the presence of failures must do so while maintaining high performance, with an eye toward energy efficiency, and when possible, maximizing battery lifetime in the face of battery discharge non-linearities. This work presents one approach for achieving reliable computation in the face of failure, and presents a set of metrics, for characterizing system behavior in terms of energy efficiency, reliability, computation performance and battery lifetime. The proposed technique for reliable computation in the presence of failures, Dynamic Fault-Tolerance Management (DFTM), relies solely on local decisions to attain global reliable computation. The proposed combined metrics, referred to as ebformability measures (since they combine the effects of energy, battery lifetime, performance and reliability), are used to evaluate the efficacy of DFTM. For an example platform employed in a realistic evaluation scenario, it is shown that system configurations with the best performance and lifetime, are not necessarily those with the best combination of performance, reliability, battery lifetime and average power consumption.

[PDF], [BibTex], [Locate in chronological publications], [Locate in classified publications]