TY - JOUR
T1 - Efficient, Dynamic Multi-task Execution on FPGA-based Computing Systems
AU - Minhas, Umar Ibrahim
AU - Woods, Roger
AU - Nikolopoulos, Dimitrios S.
AU - Karakonstantis, Georgios
PY - 2022/3/1
Y1 - 2022/3/1
N2 - With growing Field Programmable Gate Array (FPGA) device sizes and their integration in environments enabling sharing of computing resources such as cloud and edge computing, there is a requirement to share the FPGA area between multiple tasks. The resource sharing typically involves partitioning the FPGA space into fix-sized slots. This results in suboptimal resource utilisation and relatively poor performance, particularly as the number of tasks increase. Using OpenCL’s exploration capabilities, we employ clever clustering and custom, task-specific partitioning and mapping to create a novel, area sharing methodology where task resource requirements are more effectively managed. Using models with varying resource/throughput profiles, we select the most appropriate distribution based on the runtime, workload needs to enhance temporal compute density. The approach is enabled in the system stack by a corresponding task-based virtualisation model. Using 11 high performance tasks from graph analysis, linear algebra and media streaming, we demonstrate an average 2.8× higher system throughput at 2.3× better energy efficiency over existing approaches.
AB - With growing Field Programmable Gate Array (FPGA) device sizes and their integration in environments enabling sharing of computing resources such as cloud and edge computing, there is a requirement to share the FPGA area between multiple tasks. The resource sharing typically involves partitioning the FPGA space into fix-sized slots. This results in suboptimal resource utilisation and relatively poor performance, particularly as the number of tasks increase. Using OpenCL’s exploration capabilities, we employ clever clustering and custom, task-specific partitioning and mapping to create a novel, area sharing methodology where task resource requirements are more effectively managed. Using models with varying resource/throughput profiles, we select the most appropriate distribution based on the runtime, workload needs to enhance temporal compute density. The approach is enabled in the system stack by a corresponding task-based virtualisation model. Using 11 high performance tasks from graph analysis, linear algebra and media streaming, we demonstrate an average 2.8× higher system throughput at 2.3× better energy efficiency over existing approaches.
U2 - 10.1109/TPDS.2021.3101153
DO - 10.1109/TPDS.2021.3101153
M3 - Article
SN - 1045-9219
VL - 33
SP - 710
EP - 722
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 3
ER -