Accelerating key-value data structures using AVX-512 SIMD extensions

  • Mohammad Reza Hoseinyfarahabady*
  • , Javid Taheri
  • , Albert Y. Zomaya*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Downloads (Pure)

Abstract

Advanced Vector Extensions 512 (AVX-512), a modern SIMD instruction set for x86 architectures, enables data-level parallelism through 512-bit wide ZMM registers capable of processing multiple data elements concurrently within a single instruction cycle. In this study, we present a high-throughput, lock-free, in-memory architecture for key-value data-stores that exploits AVX-512 vector operations to accelerate fundamental operations such as insertion and lookup. Our design introduces an optimized memory layout that partitions the key space into two disjoint regions (primary and secondary) and employs three independent hash functions to identify candidate slots. This asymmetric layout improves key distribution, reduces collision probability, and enhances overall lookup efficiency. Experimental evaluation shows that this strategy yields the lowest insertion failure rate among tested memory partitioning schemes. By leveraging AVX-512 instructions in combination with most optimized memory layout, our implementation achieves insertion throughput within 6% of Intel TBB's highly optimized multithreaded hash map, despite avoiding explicit synchronization or thread-level parallelism. Under workloads with 550 million entries and a 90% miss rate, our approach delivers 4.0-5.1x speedup over standard STL, Boost, Robin-Hood, and Abseil hash maps, and up to 2.5 x improvement relative to TBB and Abseil. These gains are consistently observed for both 32-bit and 64-bit floating-point key types. The results confirm the viability of AVX-512-centric designs as a cost-effective alternative to thread-level parallelism, particularly in environments where minimizing synchronization overhead and ensuring deterministic execution are critical. Our findings suggest for a paradigm shift in CPU and system architecture, emphasizing wider vector units and improved memory bandwidth utilization as primary levers for scalable high-performance computing. These findings suggest that future extensions of AVX-512 capabilities, such as non-blocking memory loads, expanded vector registers, and asynchronous prefetching, could enhance the efficiency of data-intensive workloads.

Original languageEnglish
Title of host publicationProceedings of the 2025 IEEE International Conference on Cluster Computing, CLUSTER 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages12
ISBN (Electronic)9798331530198
DOIs
Publication statusPublished - 07 Oct 2025
Event2025 IEEE International Conference on Cluster Computing, CLUSTER 2025 - Edinburgh, United Kingdom
Duration: 03 Sept 202505 Sept 2025

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
PublisherIEEE
ISSN (Print)1552-5244
ISSN (Electronic)2168-9253

Conference

Conference2025 IEEE International Conference on Cluster Computing, CLUSTER 2025
Country/TerritoryUnited Kingdom
CityEdinburgh
Period03/09/202505/09/2025

Publications and Copyright Policy

This work is licensed under Queen’s Research Publications and Copyright Policy.

Keywords

  • AVX-512 Intrinsics
  • CPU-based Key-Value Data Structures
  • Hash Table Acceleration
  • High Performance Computing
  • Low-Latency Data Access
  • Memory Layout Design
  • Multiple Data (SIMD) Parallelism
  • Single Instruction
  • Vectorized Hashing

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Accelerating key-value data structures using AVX-512 SIMD extensions'. Together they form a unique fingerprint.

Cite this