SLURM Support for Remote GPU Virtualization: Implementation and Performance Study

Sergio Iserte, Adrian Castello, Rafael Mayo, Enrique S. Quintana Orti, Federico Silla, Jose Duato, Carlos Reaño, Javier Prades

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Citations (Scopus)

Abstract

SLURM is a resource manager that can be leveraged to share a collection of heterogeneous resources among the jobs in execution in a cluster. However, SLURM is not designed to handle resources such as graphics processing units (GPUs). Concretely, although SLURM can use a generic resource plugin (GRes) to manage GPUs, with this solution the hardware accelerators can only be accessed by the job that is in execution on the node to which the GPU is attached. This is a serious constraint for remote GPU virtualization technologies, which aim at providing a user-transparent access to all GPUs in cluster, independently of the specific location of the node where the application is running with respect to the GPU node. In this work we introduce a new type of device in SLURM, "rgpu", in order to gain access from any application node to any GPU node in the cluster using rCUDA as the remote GPU virtualization solution. With this new scheduling mechanism, a user can access any number of GPUs, as SLURM schedules the tasks taking into account all the graphics accelerators available in the complete cluster. We present experimental results that show the benefits of this new approach in terms of increased flexibility for the job scheduler.
Original languageEnglish
Title of host publication26th IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD): Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages318-325
ISBN (Electronic)978-1-4799-6905-0
DOIs
Publication statusPublished - 04 Dec 2014

Fingerprint

Dive into the research topics of 'SLURM Support for Remote GPU Virtualization: Implementation and Performance Study'. Together they form a unique fingerprint.

Cite this