This paper presents a novel map-reduce runtime system that is designed for scalability and for composition with other parallel software. We use a modified programming interface that expresses reduction operations over data containers as opposed to key-value pairs. This design choice admits higher efficiency as the programmer can select appropriate data structures. Our runtime targets shared memory systems, which are increasingly capable of performing data analytics on terabyte-sized data sets stored in-memory. Our map-reduce runtime is built over the Cilk programming language and outperforms Phoenix++, by 1.5x–4x for 5 out of 7 map-reduce benchmarks on 48 threads. These results arise from a combination of factors: (i) the reduction of framework overheads, including the elimination of repeated (de-)serialization of key-value pairs; (ii) the use of more appropriate intermediate data structures that reductions over containers support.
|Title of host publication||Proceedings of 2016 IEEE International Conference on Big Data (Big Data)|
|Number of pages||10|
|Publication status||Published - 06 Feb 2017|
|Event||3rd Workshop on Advances in Software and Hardware for Big Data to Knowledge Discovery (ASH) - Washington, United States|
Duration: 05 Dec 2016 → 08 Dec 2016
|Conference||3rd Workshop on Advances in Software and Hardware for Big Data to Knowledge Discovery (ASH)|
|Period||05/12/2016 → 08/12/2016|