Abstract
Aggressive technology scaling and increased static and dynamic variability caused by process, temperature, voltage, and aging effects make nanometer circuits prone to timing errors which threaten system functionality. Accurately evaluating the impact of those circuit-level errors on the resilience of a CPU and the executed applications remains a first-class design issue. However, existing error assessment frameworks fail to accurately model the effects of timing errors because they neglect microarchitecture- and workload-dependent parameters
that critically affect the error manifestation and propagation.
This paper provides a novel, cross-layer framework that addresses the lack of a holistic methodology for the understanding of the full system impact of hardware timing errors as they propagate from the circuit-level through the microarchitecture up to the application software. The proposed microarchitectureaware tool is able to realistically inject timing errors considering
circuit and workload features, accurately assessing timing error effects on any application binary. We estimate the location (bit position and instruction) and the time (cycle) of the injected errors via a workload-aware error model which relies on post place-and-route dynamic timing analysis. We also leverage
microarchitectural error injection to access the timing error reliability of a widely deployed pipelined processor under several workloads and voltage reduction levels. To evaluate the proposed tool, our fully automated toolflow is also configured to support timing error injection based on existing workload-agnostic error models. Evaluation results for various workloads and voltage reduction levels, show that our circuit- and workload-aware error injection model improves the accuracy of the error injection ratio by ~250X on average compared to workload-agnostic models. Finally, we quantify the degree to which various applications are prone to timing errors using an application vulnerability metric
that can be used early in the design cycle to guide the adoption of energy-efficient error mitigation strategies.
that critically affect the error manifestation and propagation.
This paper provides a novel, cross-layer framework that addresses the lack of a holistic methodology for the understanding of the full system impact of hardware timing errors as they propagate from the circuit-level through the microarchitecture up to the application software. The proposed microarchitectureaware tool is able to realistically inject timing errors considering
circuit and workload features, accurately assessing timing error effects on any application binary. We estimate the location (bit position and instruction) and the time (cycle) of the injected errors via a workload-aware error model which relies on post place-and-route dynamic timing analysis. We also leverage
microarchitectural error injection to access the timing error reliability of a widely deployed pipelined processor under several workloads and voltage reduction levels. To evaluate the proposed tool, our fully automated toolflow is also configured to support timing error injection based on existing workload-agnostic error models. Evaluation results for various workloads and voltage reduction levels, show that our circuit- and workload-aware error injection model improves the accuracy of the error injection ratio by ~250X on average compared to workload-agnostic models. Finally, we quantify the degree to which various applications are prone to timing errors using an application vulnerability metric
that can be used early in the design cycle to guide the adoption of energy-efficient error mitigation strategies.
Original language | English |
---|---|
Publication status | Accepted - 01 Nov 2021 |
Event | IEEE International Symposium on Workload Characterization - virtually, Storrs, United States Duration: 07 Nov 2021 → 09 Nov 2021 |
Conference
Conference | IEEE International Symposium on Workload Characterization |
---|---|
Abbreviated title | IISWC |
Country/Territory | United States |
City | Storrs |
Period | 07/11/2021 → 09/11/2021 |