On the support of inter-node P2P GPU memory copies in rCUDA

Carlos Reaño, Federico Silla

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)
56 Downloads (Pure)


Although GPUs are being widely adopted in order to noticeably reduce the execution time of many applications, their use presents several side effects such as an increased acquisition cost of the cluster nodes or an increased overall energy consumption. To address these concerns, GPU virtualization frameworks could be used. These frameworks allow accelerated applications to transparently use GPUs located in cluster nodes other than the one executing the program. Furthermore, these frameworks aim to offer the same API as the NVIDIA CUDA Runtime API does, although different frameworks provide different degree of support. In general, and because of the complexity of implementing an efficient mechanism, none of the existing frameworks provides support for memory copies between remote GPUs located in different nodes. In this paper we introduce an efficient mechanism devised for addressing the support for this kind of memory copies among GPUs located in different cluster nodes. Several options are explored and analyzed, such as the use of the GPUDirect RDMA mechanism. We focus our discussion on the rCUDA remote GPU virtualization framework. Results show that is possible to implement this kind of memory copies in such an efficient way that performance is even improved with respect to the original performance attained by CUDA when GPUs located in the same cluster node are leveraged.
Original languageEnglish
Pages (from-to)28-43
JournalJournal of Parallel and Distributed Computing
Early online date18 Jan 2019
Publication statusPublished - 01 May 2019


  • CUDA
  • GPUDirect RDMA
  • Virtualization


Dive into the research topics of 'On the support of inter-node P2P GPU memory copies in rCUDA'. Together they form a unique fingerprint.

Cite this