Projects per year
Abstract
Processor architectures has taken a turn towards many-core processors, which integrate multiple processing cores on a single chip to increase overall performance, and there are no signs that this trend will stop in the near future. Many-core processors are harder to program than multi-core and single-core processors due to the need of writing parallel or concurrent programs with high degrees of parallelism. Moreover, many-cores have to operate in a mode of strong scaling because of memory bandwidth constraints. In strong scaling increasingly finer-grain parallelism must be extracted in order to keep all processing cores busy.
Task dataflow programming models have a high potential to simplify parallel program- ming because they alleviate the programmer from identifying precisely all inter-task de- pendences when writing programs. Instead, the task dataflow runtime system detects and enforces inter-task dependences during execution based on the description of memory each task accesses. The runtime constructs a task dataflow graph that captures all tasks and their dependences. Tasks are scheduled to execute in parallel taking into account dependences specified in the task graph.
Several papers report important overheads for task dataflow systems, which severely limits the scalability and usability of such systems. In this paper we study efficient schemes to manage task graphs and analyze their scalability. We assume a programming model that supports input, output and in/out annotations on task arguments, as well as commutative in/out and reductions. We analyze the structure of task graphs and identify versions and generations as key concepts for efficient management of task graphs. Then, we present three schemes to manage task graphs building on graph representations, hypergraphs and lists. We also consider a fourth edge-less scheme that synchronizes tasks using integers. Analysis using micro-benchmarks shows that the graph representation is not always scalable and that the edge-less scheme introduces least overhead in nearly all situations.
Task dataflow programming models have a high potential to simplify parallel program- ming because they alleviate the programmer from identifying precisely all inter-task de- pendences when writing programs. Instead, the task dataflow runtime system detects and enforces inter-task dependences during execution based on the description of memory each task accesses. The runtime constructs a task dataflow graph that captures all tasks and their dependences. Tasks are scheduled to execute in parallel taking into account dependences specified in the task graph.
Several papers report important overheads for task dataflow systems, which severely limits the scalability and usability of such systems. In this paper we study efficient schemes to manage task graphs and analyze their scalability. We assume a programming model that supports input, output and in/out annotations on task arguments, as well as commutative in/out and reductions. We analyze the structure of task graphs and identify versions and generations as key concepts for efficient management of task graphs. Then, we present three schemes to manage task graphs building on graph representations, hypergraphs and lists. We also consider a fourth edge-less scheme that synchronizes tasks using integers. Analysis using micro-benchmarks shows that the graph representation is not always scalable and that the edge-less scheme introduces least overhead in nearly all situations.
Original language | English |
---|---|
Article number | 61 |
Pages (from-to) | 1-24 |
Number of pages | 24 |
Journal | ACM Transactions on Architecture and Code Optimization |
Volume | 10 |
Issue number | 4 |
DOIs | |
Publication status | Published - Dec 2013 |
Event | Conference on High-Performance Embedded Architecture and Compilation - Vienna, Austria Duration: 20 Jan 2014 → 22 Jan 2014 |
Keywords
- parallel programming
- task dataflow
- runtime dependence tracking
- Swan
Fingerprint
Dive into the research topics of 'Analysis of Dependence Tracking Algorithms for Task Dataflow Execution'. Together they form a unique fingerprint.-
R6394CSC: Software management of hybrid DRAM/NVRAM memory systems
Nikolopoulos, D. (PI) & Vandierendonck, H. (CoI)
01/08/2012 → …
Project: Research
-
R6396CSC: SCORPIO: Significance-Based Computing for Reliability and Power Optimization
Nikolopoulos, D. (PI) & Karakonstantis, G. (CoI)
01/08/2012 → 31/05/2016
Project: Research
-
Invited Talk and membership in panel at the "European Initiative on Runtime Systems and Architecture Co-Design"
Vandierendonck, H. (Speaker)
07 May 2015Activity: Talk or presentation types › Invited talk
-
High-Performance Embedded Architecture and Compilation (HiPEAC'14, EU NoE conference)
Vandierendonck, H. (Speaker)
20 Jan 2014 → 22 Jan 2014Activity: Participating in or organising an event types › Participation in conference