With growing Field Programmable Gate Array (FPGA) device sizes and their integration in environments enabling sharing of computing resources such as cloud and edge computing, there is a requirement to share the FPGA area between multiple tasks. The resource sharing typically involves partitioning the FPGA space into fix-sized slots. This results in suboptimal resource utilisation and relatively poor performance, particularly as the number of tasks increase. Using OpenCL’s exploration capabilities, we employ clever clustering and custom, task-specific partitioning and mapping to create a novel, area sharing methodology where task resource requirements are more effectively managed. Using models with varying resource/throughput profiles, we select the most appropriate distribution based on the runtime, workload needs to enhance temporal compute density. The approach is enabled in the system stack by a corresponding task-based virtualisation model. Using 11 high performance tasks from graph analysis, linear algebra and media streaming, we demonstrate an average 2.8× higher system throughput at 2.3× better energy efficiency over existing approaches.
|Journal||IEEE Transactions on Parallel and Distributed Systems|
|Early online date||30 Jul 2021|
|Publication status||Early online date - 30 Jul 2021|