Abstract
The exponential growth in user and application data entails new means for providing fault tolerance and protection against data loss. High Performance Com- puting (HPC) storage systems, which are at the forefront of handling the data del- uge, typically employ hardware RAID at the backend. However, such solutions are costly, do not ensure end-to-end data integrity, and can become a bottleneck during data reconstruction. In this paper, we design an innovative solution to achieve a flex- ible, fault-tolerant, and high-performance RAID-6 solution for a parallel file system (PFS). Our system utilizes low-cost, strategically placed GPUs — both on the client and server sides — to accelerate parity computation. In contrast to hardware-based approaches, we provide full control over the size, length and location of a RAID array on a per file basis, end-to-end data integrity checking, and parallelization of RAID array reconstruction. We have deployed our system in conjunction with the widely-used Lustre PFS, and show that our approach is feasible and imposes ac- ceptable overhead.
Original language | English |
---|---|
Title of host publication | Handbook on Data Centers |
Editors | Samee Ullah Khan, Albert Y. Zomaya |
Publisher | Springer |
Pages | 729-753 |
ISBN (Electronic) | 978-1-4939-2092-1 |
ISBN (Print) | 978-1-4939-2091-4 |
Publication status | Published - 2015 |