Abstract
Compiler-based fault injection (FI) has become a popular technique
for resilience studies to understand the impact of soft errors in
supercomputing systems. Compiler-based FI frameworks inject
faults at a high intermediate-representation level. However, they
are less accurate than machine code, binary-level FI because they
lack access to all dynamic instructions, thus they fail to mimic
certain fault manifestations. In this paper, we study the limitations
of current practices in compiler-based FI and how they impact the
interpretation of results in resilience studies.
We propose REFINE, a novel framework that addresses these limitations,
performing FI in a compiler backend. Our approach provides
the portability and efficiency of compiler-based FI, while keeping
accuracy comparable to binary-level FI methods. We demonstrate
our approach in 14 HPC programs and show that, due to our unique
design, its runtime overhead is significantly smaller than state-ofthe-art
compiler-based FI frameworks, reducing the time for large
FI experiments.
Original language | English |
---|---|
Title of host publication | Proceedings of SC17, Denver, CO, USA, November 12–17, 2017 |
Publisher | Association for Computing Machinery |
Number of pages | 14 |
ISBN (Print) | 978-1-4503-5114-0 |
DOIs | |
Publication status | Published - 17 Nov 2017 |
Event | Supercomputing'17 (SC17): International Conference on High Performance Computing, Networking, Storage and Analysis - Denver, United States Duration: 11 Nov 2017 → 17 Nov 2017 |
Conference
Conference | Supercomputing'17 (SC17): International Conference on High Performance Computing, Networking, Storage and Analysis |
---|---|
Abbreviated title | SC17 |
Country/Territory | United States |
City | Denver |
Period | 11/11/2017 → 17/11/2017 |