Realizing Accelerated Cost-Effective Distributed RAID

Aleksandr Khasymski, M. Mustafa Rafique, Ali R. Butt, Sudharshan S. Vazhkudai, Dimitrios S. Nikolopoulos

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)peer-review

1 Citation (Scopus)

Abstract

The exponential growth in user and application data entails new means for providing fault tolerance and protection against data loss. High Performance Com- puting (HPC) storage systems, which are at the forefront of handling the data del- uge, typically employ hardware RAID at the backend. However, such solutions are costly, do not ensure end-to-end data integrity, and can become a bottleneck during data reconstruction. In this paper, we design an innovative solution to achieve a flex- ible, fault-tolerant, and high-performance RAID-6 solution for a parallel file system (PFS). Our system utilizes low-cost, strategically placed GPUs — both on the client and server sides — to accelerate parity computation. In contrast to hardware-based approaches, we provide full control over the size, length and location of a RAID array on a per file basis, end-to-end data integrity checking, and parallelization of RAID array reconstruction. We have deployed our system in conjunction with the widely-used Lustre PFS, and show that our approach is feasible and imposes ac- ceptable overhead.
Original languageEnglish
Title of host publicationHandbook on Data Centers
EditorsSamee Ullah Khan, Albert Y. Zomaya
PublisherSpringer
Pages729-753
ISBN (Electronic)978-1-4939-2092-1
ISBN (Print)978-1-4939-2091-4
Publication statusPublished - 2015

Fingerprint

Dive into the research topics of 'Realizing Accelerated Cost-Effective Distributed RAID'. Together they form a unique fingerprint.

Cite this