Dynamic Fault-Tolerance and Metrics for Battery Powered,
Failure-Prone Systems
Phillip Stanley-Marbell and Diana Marculescu.
In
Proceedings of the International Conference on Computer Aided Design, ICCAD '03,
November 2003.
ABSTRACT
Emerging VLSI technologies and platforms are giving rise to systems
with inherently high potential for runtime failure. Such failures
range from intermittent electrical and mechanical failures at the
system level, to device failures at the chip level. Techniques to
provide reliable computation in the presence of failures must do
so while maintaining high performance, with an eye toward energy
efficiency. When possible, they should maximize battery lifetime
in the face of battery discharge non-linearities. This paper
introduces the concept of adaptive fault-tolerance management for
failure-prone systems, and a classification of local algorithms for
achieving system-wide reliability.
In order to judge the efficacy of the proposed algorithms for dynamic
fault-tolerance management, a set of metrics, for characterizing
system behavior in terms of energy efficiency, reliability, computation
performance and battery lifetime, is presented. For an example
platform employed in a realistic evaluation scenario, it is shown
that system configurations with the best performance and lifetime
are not necessarily those with the best combination of performance,
reliability, battery lifetime and average power consumption.
[PDF], [BibTex], [Locate in chronological publications], [Locate in classified publications]