Merrimac - Papers
Home Meetings Members Publications/
Talks/Downloads
Architecture Brook Compiler Applications Related Links Internal
Papers
Talks
Posters
Reports
Specs
White-
papers
Downloads

Fault Tolerance Techniques for the Merrimac Streaming Supercomputer

Mattan Erez, Nuwan Jayasena, Timothy J. Knight, and William J. Dally

in Proceedings of the SC|05 Conference, November 12-18 2005, Seattle, Washington, USA

Abstract:

As device scales shrink, higher transistor counts are available while soft-errors, even in logic, become a major concern. A new class of architectures, such as Merrimac and the IBM Cell, take advantage of the higher transistor count by exposing control, communication, and a large number of functional-units at the architectural level, thus achieving high performance and efficiency. This paper explores soft-error fault tolerance in the context of these compute-intensive architectures, which differ significantly from their control-intensive CPU counterparts. The main goal of the proposed schemes for Merrimac is to conserve the critical and costly off-chip bandwidth and on-chip storage resources, while maintaining high peak and sustained performance. We achieve this by allowing for reconfigurability and relying on programmer input. The processor is either run at full peak performance employing software fault-tolerance methods, or reduced performance with hardware redundancy. We present several methods, their analysis, and detailed case studies.

Paper:

Adobe Acrobat PDF

BibTeX:

@conference{ref:sc05_faulttolerance,
   author = {Mattan Erez and Nuwan Jayasena and Timothy J. Knight and William J. Dally},
   title = {{Fault Tolerance Techniques for the Merrimac Streaming Supercomputer}},
   booktitle = {{SC}|05},
   year = {2005},
   address = {Seattle, Washington, USA},
   month = {November},
   day = {12--18}
}

(c) ACM, 2005. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in the Proceedings of SC|05, November 12--18, 2005, Seattle, Washington, USA.

Last modified: Mon Oct 10 13:00:58 PDT 2005