Projects per year
Today’s rapid generation of data and the increased need for higher memory capacity has triggered a lot of studies on aggressive scaling of refresh period, which is currently set according to rare worst case conditions. Such studies analysed in detail the data-dependent circuit level factors and indicated the need for online DRAM characterization due to the variable cell retention time. They have done so by executing few test data patterns on FPGAs under controlled temperatures by using thermal testbeds, which however cannot be available in the field. Moreover, the existing studies were not able to reveal the system level effects, which may be excited under the execution of workloads on real systems and directly or indirectly affect DRAM reliability. In this paper, we develop a first of its kind experimental framework based on a state-of-the-art 64-bit ARM based server with Linux OS, in which we enabled the DRAM characterization under relaxed refresh period by executing conventional test data patterns as well as popular HPC and Cloud workloads. Such a setup allows us for the first time to evaluate the impact of any system level factors on DRAM behaviour and the efficacy of conventional test patterns in typical conditions without controlling the DRAM temperature. Our results indicate that common test patterns are ineffective in identifying error-prone locations at low DRAM temperatures. Furthermore, the analysis of various measured performance counters and manifested error rates reveal that there is a strong correlation between system utilization and DRAM reliability. By exploiting such findings, we developed a benchmark, which can indirectly stress the DRAM temperature and thus used for characterization in the field without needing any complicated thermal equipment. Results show that the stress benchmark can increase the DRAM temperature above 43 ◦C and cover up to 60% of erroneous memory locations in the 144 tested DRAM chips. Finally, our study shows for the first time that the refresh period can be relaxed by 35 times on such a commodity system with all errors being corrected by the available error correcting codes, resulting in 11.5% power savings on average.
|Title of host publication||2018 IEEE 24th International Symposium on On-Line Testing and Robust System Design, IOLTS 2018|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Pages||236 - 239|
|Publication status||Published - 01 Oct 2018|
|Event||International Symposium on On-Line Testing and Robust System Design, IOLTS - Costa Brava, Spain|
Duration: 02 Jul 2018 → 04 Jul 2018
|Name||2018 IEEE 24th International Symposium on On-Line Testing and Robust System Design, IOLTS 2018|
|Conference||International Symposium on On-Line Testing and Robust System Design, IOLTS|
|Period||02/07/2018 → 04/07/2018|
FingerprintDive into the research topics of 'DRAM Characterization under Relaxed Refresh Period Considering System Level Effects within a Commodity Server'. Together they form a unique fingerprint.
- 1 Active
R6529CSC: A Universal Micro-Server Ecosystem by Exceeding the Energy and Performance Scaling Boundaries
Karakonstantis, G., Nikolopoulos, D., O'Neill, M. & Vandierendonck, H.
17/12/2015 → …