DEFCON: Generating and Detecting Failure-prone Instruction Sequences via Stochastic Search

Research output: Chapter in Book/Report/Conference proceedingConference contribution

55 Downloads (Pure)

Abstract

The increased variability and adopted low supply voltages render nanometer devices prone to timing failures, which threaten the functionality of digital circuits. Recent schemes focused on developing instruction-aware failure prediction models and adapting voltage/frequency to avoid errors while saving energy. However, such schemes may be inaccurate when applied to pipelined cores since they consider only the currently executed instruction and the preceding one, thereby neglecting the impact of all the concurrently executing instructions on failure occurrence. In this paper, we first demonstrate that the order and type of instructions in sequences with a length equal to the pipeline depth affect significantly the failure rate. To overcome the practically impossible evaluation of the impact of all possible sequences on failures, we present DEFCON, a fully automated framework that stochastically searches for the most failure-prone instruction sequences (ISQs). DEFCON generates such sequences by integrating a properly formulated genetic algorithm with accurate post-layout dynamic timing analysis, considering the data-dependent path sensitization and instruction execution history. The generated micro-architecture aware ISQs are then used by DEFCON to estimate the failure vulnerability of any application. To evaluate the efficacy of the proposed framework, we implement a pipelined floating-point unit and perform dynamic timing analysis based on input data that we extract from a variety of applications consisting of up-to 43.5M ISQs. Our results show that DEFCON reveals quickly ISQs that maximize the output quality loss and correctly detects 99.7% of the actual faulty ISQs in different applications under various levels of variation-induced delay increase. Finally, DEFCON enable us to identify failure-prone ISQs early at the design cycle, and save 26.8% of energy on average when combined with a clock stretching mechanism.
Original languageEnglish
Title of host publicationDesign, Automation and Test in Europe Conference 2020: Proceedings
Publisher IEEE
DOIs
Publication statusPublished - 15 Jun 2020
EventDesign, Automation and Test in Europe Conference -
Duration: 09 Mar 2020 → …
https://www.date-conference.com/

Conference

ConferenceDesign, Automation and Test in Europe Conference
Abbreviated titleDATE
Period09/03/2020 → …
Internet address

Bibliographical note

I would like to note that since one of the contributors (i.e. Dr Giorgis Georgakoudis) is affiliated by LLNL, LLNL inquires to get the copyright statement required by the publisher (IEEE/ACM etc.).
In particular, according to LLNL policy, a designated copyright releasing agent (from LLNL) needs to approve and sign the publisher copyright statement. Then LLNL releases as usual the copyright of the paper to the publisher. Please note that since I am the lead author, I will retain the rights to put manuscript versions in public repos or personal pages, as usually designated by the publisher and/or PURE.

Is this OK with you? Please let me know as soon as possible in order to submit the final, camera-ready version of the manuscript.

Thank you in advance for your time and attention.

Kind Regards,
Ioannis.

Fingerprint Dive into the research topics of 'DEFCON: Generating and Detecting Failure-prone Instruction Sequences via Stochastic Search'. Together they form a unique fingerprint.

Cite this