Projects per year
Abstract
As core counts in processors increases, it becomes harder to schedule and distribute work in a timely and scalable manner. This paper enhances the scalability of parallel loop schedulers by specialising schedulers for fine-grain loops.
We propose a low-overhead work distribution mechanism for a static scheduler that uses no atomic operations. We integrate our static scheduler with the Intel OpenMP and Cilk- plus parallel task schedulers to build hybrid schedulers. Compiler support enables efficient reductions for Cilk, without changing the programming interface of Cilk reducers. Detailed, quantitative measurements demonstrate that our techniques achieve scalable performance on a 48-core machine and the scheduling overhead is 43% lower than Intel OpenMP and 12.1x lower than Cilk. We demonstrate consistent performance improvements on a range of HPC and data analytics codes. Performance gains are more important as loops become finer-grain and thread counts increase. We observe consistently 16–30% speedup on 48 threads, with a peak of 2.8x speedup.
We propose a low-overhead work distribution mechanism for a static scheduler that uses no atomic operations. We integrate our static scheduler with the Intel OpenMP and Cilk- plus parallel task schedulers to build hybrid schedulers. Compiler support enables efficient reductions for Cilk, without changing the programming interface of Cilk reducers. Detailed, quantitative measurements demonstrate that our techniques achieve scalable performance on a 48-core machine and the scheduling overhead is 43% lower than Intel OpenMP and 12.1x lower than Cilk. We demonstrate consistent performance improvements on a range of HPC and data analytics codes. Performance gains are more important as loops become finer-grain and thread counts increase. We observe consistently 16–30% speedup on 48 threads, with a peak of 2.8x speedup.
Original language | English |
---|---|
Number of pages | 29 |
Journal | Concurrency and Computation: Practice and Experience |
Early online date | 05 Apr 2021 |
DOIs | |
Publication status | Early online date - 05 Apr 2021 |
Fingerprint
Dive into the research topics of 'Reducing the Burden of Parallel Loop Schedulers for Many-Core Processors'. Together they form a unique fingerprint.-
R6394CSC: Software management of hybrid DRAM/NVRAM memory systems
Nikolopoulos, D. (PI) & Vandierendonck, H. (CoI)
01/08/2012 → …
Project: Research
-
R1785CSC: Modelling Stencil Codes After Graph Analytics
Vandierendonck, H. (PI)
28/06/2017 → 30/09/2017
Project: Research
-
R6551CSC: Open TransPREcision COMPuting
Woods, R. (PI), Karakonstantis, G. (CoI) & Vandierendonck, H. (CoI)
03/11/2016 → 31/12/2020
Project: Research