Although GPUs are being widely adopted in order to noticeably reduce the execution time of many applications, their use presents several side effects such as an increased acquisition cost of the cluster nodes or an increased overall energy consumption. To address these concerns, GPU virtualization frameworks could be used. These frameworks allow accelerated applications to transparently use GPUs located in cluster nodes other than the one executing the program. Furthermore, these frameworks aim to offer the same API as the NVIDIA CUDA Runtime API does, although different frameworks provide different degree of support. In general, and because of the complexity of implementing an efficient mechanism, none of the existing frameworks provides support for memory copies between remote GPUs located in different nodes. In this paper we introduce an efficient mechanism devised for addressing the support for this kind of memory copies among GPUs located in different cluster nodes. Several options are explored and analyzed, such as the use of the GPUDirect RDMA mechanism. We focus our discussion on the rCUDA remote GPU virtualization framework. Results show that is possible to implement this kind of memory copies in such an efficient way that performance is even improved with respect to the original performance attained by CUDA when GPUs located in the same cluster node are leveraged.
- GPUDirect RDMA