TY - GEN
T1 - Microarchitecture design and benchmarking of custom SHA-3 instruction for RISC-V
AU - Bolat, Alperen
AU - Sezer, Sakir
AU - McLaughlin, Kieran
AU - Hui, Henry
PY - 2025/5/23
Y1 - 2025/5/23
N2 - Integrating cryptographic accelerators into modern CPU architectures presents unique microarchitectural challenges, particularly when extending instruction sets with complex and multistage operations. Hardware-assisted cryptographic instructions, such as Intel’s AES-NI and ARM’s custom instructions for encryption workloads, have demonstrated substantial performance improvements. However, efficient SHA-3 acceleration remains an open problem due to its distinct permutation-based structure and memory access patterns. Existing solutions primarily rely on standalone coprocessors or software optimizations, often avoiding the complexities of direct microarchitectural integration. This study investigates the architectural challenges of embedding a SHA-3 permutation operation as a custom instruction within a general-purpose processor, focusing on pipelined simultaneous execution, storage utilization, and hardware cost. In this paper, we investigated and prototyped a SHA-3 custom instruction for the RISC-V CPU architecture. Using cycle-accurate GEM5 simulations and FPGA prototyping, our results demonstrate performance improvements of up to 8.02× for RISC-V optimized SHA-3 software workloads and up to 46.31× for Keccak-specific software workloads, with only a 15.09% increase in registers and a 11.51% increase in LUT utilization. These findings provide critical insights into the feasibility and impact of SHA-3 acceleration at the microarchitectural level, highlighting practical design considerations for future cryptographic instruction set extensions.
AB - Integrating cryptographic accelerators into modern CPU architectures presents unique microarchitectural challenges, particularly when extending instruction sets with complex and multistage operations. Hardware-assisted cryptographic instructions, such as Intel’s AES-NI and ARM’s custom instructions for encryption workloads, have demonstrated substantial performance improvements. However, efficient SHA-3 acceleration remains an open problem due to its distinct permutation-based structure and memory access patterns. Existing solutions primarily rely on standalone coprocessors or software optimizations, often avoiding the complexities of direct microarchitectural integration. This study investigates the architectural challenges of embedding a SHA-3 permutation operation as a custom instruction within a general-purpose processor, focusing on pipelined simultaneous execution, storage utilization, and hardware cost. In this paper, we investigated and prototyped a SHA-3 custom instruction for the RISC-V CPU architecture. Using cycle-accurate GEM5 simulations and FPGA prototyping, our results demonstrate performance improvements of up to 8.02× for RISC-V optimized SHA-3 software workloads and up to 46.31× for Keccak-specific software workloads, with only a 15.09% increase in registers and a 11.51% increase in LUT utilization. These findings provide critical insights into the feasibility and impact of SHA-3 acceleration at the microarchitectural level, highlighting practical design considerations for future cryptographic instruction set extensions.
M3 - Conference contribution
T3 - IEEE Computer Society Annual Symposium on VLSI (ISVLSI): Proceedings
BT - 2025 IEEE Computer Society Annual Symposium on VLSI (ISVLSI): Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE Computer Society Annual Symposium on VLSI: ISVLSI 2025
Y2 - 6 July 2025 through 9 July 2025
ER -