REFINE: Realistic Fault Injection via Compiler-based Instrumentation for Accuracy, Portability and Speed

Giorgis Georgakoudis, Ignacio Laguna, Dimitrios S. Nikolopoulos, Martin Schulz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

28 Citations (Scopus)
846 Downloads (Pure)

Abstract

Compiler-based fault injection (FI) has become a popular technique for resilience studies to understand the impact of soft errors in supercomputing systems. Compiler-based FI frameworks inject faults at a high intermediate-representation level. However, they are less accurate than machine code, binary-level FI because they lack access to all dynamic instructions, thus they fail to mimic certain fault manifestations. In this paper, we study the limitations of current practices in compiler-based FI and how they impact the interpretation of results in resilience studies. We propose REFINE, a novel framework that addresses these limitations, performing FI in a compiler backend. Our approach provides the portability and efficiency of compiler-based FI, while keeping accuracy comparable to binary-level FI methods. We demonstrate our approach in 14 HPC programs and show that, due to our unique design, its runtime overhead is significantly smaller than state-ofthe-art compiler-based FI frameworks, reducing the time for large FI experiments.
Original languageEnglish
Title of host publicationProceedings of SC17, Denver, CO, USA, November 12–17, 2017
PublisherAssociation for Computing Machinery
Number of pages14
ISBN (Print)978-1-4503-5114-0
DOIs
Publication statusPublished - 17 Nov 2017
EventSupercomputing'17 (SC17): International Conference on High Performance Computing, Networking, Storage and Analysis - Denver, United States
Duration: 11 Nov 201717 Nov 2017

Conference

ConferenceSupercomputing'17 (SC17): International Conference on High Performance Computing, Networking, Storage and Analysis
Abbreviated titleSC17
Country/TerritoryUnited States
CityDenver
Period11/11/201717/11/2017

Fingerprint

Dive into the research topics of 'REFINE: Realistic Fault Injection via Compiler-based Instrumentation for Accuracy, Portability and Speed'. Together they form a unique fingerprint.

Cite this