DRAM Characterization under Relaxed Refresh Period Considering System Level Effects within a Commodity Server

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)
242 Downloads (Pure)

Abstract

Today’s rapid generation of data and the increased need for higher memory capacity has triggered a lot of studies on aggressive scaling of refresh period, which is currently set according to rare worst case conditions. Such studies analysed in detail the data-dependent circuit level factors and indicated the need for online DRAM characterization due to the variable cell retention time. They have done so by executing few test data patterns on FPGAs under controlled temperatures by using thermal testbeds, which however cannot be available in the field. Moreover, the existing studies were not able to reveal the system level effects, which may be excited under the execution of workloads on real systems and directly or indirectly affect DRAM reliability. In this paper, we develop a first of its kind experimental framework based on a state-of-the-art 64-bit ARM based server with Linux OS, in which we enabled the DRAM characterization under relaxed refresh period by executing conventional test data patterns as well as popular HPC and Cloud workloads. Such a setup allows us for the first time to evaluate the impact of any system level factors on DRAM behaviour and the efficacy of conventional test patterns in typical conditions without controlling the DRAM temperature. Our results indicate that common test patterns are ineffective in identifying error-prone locations at low DRAM temperatures. Furthermore, the analysis of various measured performance counters and manifested error rates reveal that there is a strong correlation between system utilization and DRAM reliability. By exploiting such findings, we developed a benchmark, which can indirectly stress the DRAM temperature and thus used for characterization in the field without needing any complicated thermal equipment. Results show that the stress benchmark can increase the DRAM temperature above 43 ◦C and cover up to 60% of erroneous memory locations in the 144 tested DRAM chips. Finally, our study shows for the first time that the refresh period can be relaxed by 35 times on such a commodity system with all errors being corrected by the available error correcting codes, resulting in 11.5% power savings on average.
Original languageEnglish
Title of host publication2018 IEEE 24th International Symposium on On-Line Testing and Robust System Design, IOLTS 2018
Publisher IEEE
Pages236 - 239
ISBN (Electronic)978-1-5386-5992-2
ISBN (Print)9781538659922
DOIs
Publication statusPublished - 01 Oct 2018
EventInternational Symposium on On-Line Testing and Robust System Design, IOLTS - Costa Brava, Spain
Duration: 02 Jul 201804 Jul 2018
http://tima.univ-grenoble-alpes.fr/conferences/iolts/iolts18/

Publication series

Name2018 IEEE 24th International Symposium on On-Line Testing and Robust System Design, IOLTS 2018

Conference

ConferenceInternational Symposium on On-Line Testing and Robust System Design, IOLTS
CountrySpain
CityCosta Brava
Period02/07/201804/07/2018
Internet address

Fingerprint Dive into the research topics of 'DRAM Characterization under Relaxed Refresh Period Considering System Level Effects within a Commodity Server'. Together they form a unique fingerprint.

Cite this