International Association for Cryptologic Research

International Association
for Cryptologic Research

Transactions on Cryptographic Hardware and Embedded Systems, Volume 2024

Load-Balanced Parallel Implementation on GPUs for Multi-Scalar Multiplication Algorithm


Yutian Chen
School of Cyber Science and Engineering, Wuhan University, Wuhan, China

Cong Peng
School of Cyber Science and Engineering, Wuhan University, Wuhan, China

Yu Dai
School of Cyber Science and Engineering, Wuhan University, Wuhan, China

Min Luo
School of Cyber Science and Engineering, Wuhan University, Wuhan, China

Debiao He
School of Cyber Science and Engineering, Wuhan University, Wuhan, China


Keywords: Multi-scalar Multiplication, Zero-knowledge Proof, Parallel Implementation


Abstract

Multi-scalar multiplication (MSM) is an important building block in most of elliptic-curve-based zero-knowledge proof systems, such as Groth16 and PLONK. Recently, Lu et al. proposed cuZK, a new parallel MSM algorithm on GPUs. In this paper, we revisit this scheme and present a new GPU-based implementation to further improve the performance of MSM algorithm. First, we propose a novel method for mapping scalars into Pippenger’s bucket indices, largely reducing the number of buckets compared to the original Pippenger algorithm. Second, in the case that memory is sufficient, we develop a new efficient algorithm based on homogeneous coordinates in the bucket accumulation phase. Moreover, our accumulation phase is load-balanced, which means the parallel speedup ratio is almost linear growth as the number of device threads increases. Finally, we also propose a parallel layered reduction algorithm for the bucket aggregation phase, whose time complexity remains at the logarithmic level of the number of buckets. The implementation results over the BLS12-381 curve on the V100 graphics card show that our proposed algorithm achieves up to 1.998x, 1.821x and 1.818x speedup compared to cuZK at scales of 221, 222, and 223, respectively.

Publication

Transactions of Cryptographic Hardware and Embedded Systems, Volume 2024, Issue 2

Paper

Artifact

Artifact number
tches/2024/a9

Artifact published
May 31, 2024

Badge
IACR CHES Artifacts Functional

README

ZIP (1195228 Bytes)  

View on Github

License


BibTeX How to cite

Chen, Y., Peng, C., Dai, Y., Luo, M., & He, D. (2024). Load-Balanced Parallel Implementation on GPUs for Multi-Scalar Multiplication Algorithm. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2024(2), 522–544. https://doi.org/10.46586/tches.v2024.i2.522-544 Artifact available at https://artifacts.iacr.org/tches/2024/a9