TY - GEN
T1 - New speed records for Salsa20 stream cipher using an autotuning framework on GPUs
AU - Khalid, Ayesha
AU - Paul, Goutam
AU - Chattopadhyay, Anupam
PY - 2013/10/7
Y1 - 2013/10/7
N2 - Since the introduction of the CUDA programming model, GPUs are considered a viable platform for accelerating non-graphical applications. Many cryptographic algorithms have been reported to achieve remarkable performance speedups, especially block ciphers. For stream ciphers, however, the lack of reported GPU acceleration endeavors is due to their inherent iterative structures that prohibit parallelization. In this paper, we propose an efficient implementation methodology for data-parallel cryptographic functions in a batch processing fashion on modern GPUs in general and optimizations for Salsa20 in particular. We present an autotuning framework to reach the most optimized set of device and application parameters for Salsa20 kernel variants with throughput maximization as a figure of merit. The peak performance achieved by our implementation for Salsa20/12 is 2.7 GBps and 43.44 GBps with and without memory transfers respectively on NVIDIA GeForce GTX 590. These figures beat the fastest reported GPU implementation of any stream cipher in the eSTREAM portfolio including Salsa20/12, as well as the block cipher AES optimized by hand-tuning, and thus, to the best of our knowledge set a new speed record.
AB - Since the introduction of the CUDA programming model, GPUs are considered a viable platform for accelerating non-graphical applications. Many cryptographic algorithms have been reported to achieve remarkable performance speedups, especially block ciphers. For stream ciphers, however, the lack of reported GPU acceleration endeavors is due to their inherent iterative structures that prohibit parallelization. In this paper, we propose an efficient implementation methodology for data-parallel cryptographic functions in a batch processing fashion on modern GPUs in general and optimizations for Salsa20 in particular. We present an autotuning framework to reach the most optimized set of device and application parameters for Salsa20 kernel variants with throughput maximization as a figure of merit. The peak performance achieved by our implementation for Salsa20/12 is 2.7 GBps and 43.44 GBps with and without memory transfers respectively on NVIDIA GeForce GTX 590. These figures beat the fastest reported GPU implementation of any stream cipher in the eSTREAM portfolio including Salsa20/12, as well as the block cipher AES optimized by hand-tuning, and thus, to the best of our knowledge set a new speed record.
KW - CUDA
KW - eSTREAM
KW - GPU
KW - Salsa20
KW - Salsa20/r
KW - stream cipher
UR - http://www.scopus.com/inward/record.url?scp=84884833852&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-38553-7-11
DO - 10.1007/978-3-642-38553-7-11
M3 - Conference contribution
AN - SCOPUS:84884833852
SN - 9783642385520
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 189
EP - 207
BT - Progress in Cryptology, AFRICACRYPT 2013 - 6th International Conference on Cryptology in Africa, Proceedings
T2 - 6th International Conference on the Theory and Application of Cryptographic Techniques in Africa, AFRICACRYPT 2013
Y2 - 22 June 2013 through 24 June 2013
ER -