TY - JOUR
T1 - Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs
AU - Minhas, Umar Ibrahim
AU - Woods, Roger
AU - Karakonstantis, Georgios
PY - 2021/5/23
Y1 - 2021/5/23
N2 - Whilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9× and 2.3× higher system throughput for compute and mixed intensity tasks, while 0.2× lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3× system speedup over previous schemes.
AB - Whilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9× and 2.3× higher system throughput for compute and mixed intensity tasks, while 0.2× lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3× system speedup over previous schemes.
U2 - 10.1007/s11265-020-01633-z
DO - 10.1007/s11265-020-01633-z
M3 - Article
SN - 0922-5773
VL - 93
SP - 587
EP - 602
JO - Journal of VLSI Signal Processing
JF - Journal of VLSI Signal Processing
ER -