Transactions on Cryptographic Hardware and Embedded Systems 2025
SimdMSM:
SIMD-accelerated Multi-Scalar Multiplication Framework for zkSNARKs
README
SimdMSM
This source code is an efficient implemetation of MSM and zkSNARK using AVX-512IFMA. It is the artifact of the paper SimdMSM: SIMD-accelerated Multi-Scalar Multiplication Framework for zkSNARKs accepted to TCHES 2025.
Overview
There are three subfolders included in this repository:
- AVX-MSM
: the MSM implementation instantiated with AVX512-IFMA engine based on the RELIC library. The specific implementation code can be found in the AVX-MSM/demo/381/
directory. The AVX512-IFMA engine implementation is based on Cheng et al.’s work.
- AVX-ZK
: integrating AVX-MSM implementation into the libsnark library. The part of r1cs_gg_ppzksnark
, commonly known as the famous Groth16 protocol, is changed to using new AVX-MSM.
- jsnark
: a tool for evaluating the performance of AVX-ZK under different real-world workloads.
Requirement
For AVX-MSM
- Ubuntu 22.04.4
- gcc version 11.4.0
- cmake 3.22.1
- support AVX512-IFMA instruction sets
- pandas(for python script)
For AVX-ZK
$ sudo apt-get install build-essential cmake git libgmp3-dev libprocps3-dev python-markdown libboost-all-dev libssl-dev
```
### For jsnark
You can consult instructions in [jsnark](https://github.com/akosba/jsnark).
- JDK 8 (Higher versions are also expected to work)
- Junit 4
- BouncyCastle library
## Build instructions
## AVX-MSM
### Building
Target the `SimdMSM` library.
```shell
$ cd AVX-MSM
$ cd demo/381/
$ make lib
If you encounter the error ../../../preset/x64-pbc-bls12-381.sh: not found
, try the following two commands:
$ chmod +x ../../preset/x64-pbc-bls12-381.sh
$ sed -i 's/\r$//' ../../preset/x64-pbc-bls12-381.sh
Using
Run AVX-MSM. The benchmark's data size WNUM
and window size WMBITS
can be modified in the file /test/test_pip_ifma.c
.
$ mkdir build
$ make ifma
$ ./build/test_pip_ifma
Run AVX-pair-MSM. The benchmark's data size WNUM
and window size WMBITS
can be modified in the file /test/test_pair_ifma.c
.
$ make pair_ifma
$ ./build/test_pair_ifma
Run AVX-MSM(muti-threads). The benchmark's data size WNUM
and window size WMBITS
can be modified in the file /test/test_pip_threads.c
.
$ make thread
$ ./build/test_pip_threads
Run AVX-pair-MSM(muti-threads). The benchmark's data size WNUM
and window size WMBITS
can be modified in the file /test/test_pair_threads.c
.
$ make pair_thread
$ ./build/test_pair_threads
You can also use the Python script to perform batch benching.
$ mkdir build
$ python bench.py
Output example
The output structure of AVX-MSM, AVX-pair-MSM, AVX-MSM (multi-threads), and AVX-pair-MSM (multi-threads) is generally similar. Here, I'll use AVX-MSM as an example to describe its output structure.
The three macros WNUM
, WMBITS
, and NBENCHS
in the test file represent the multi-scalar multiplication scale, window size, and number of benchmark iterations, respectively.
Pippenger_old=0.790256 // the execution time of the original Pippenger
Pippenger_ifma=0.325606 // the execution time of our AVX-MSM (in seconds)
YES // the computation result is correct
The output of bench.py
is as follows: the first column represents the multi-scalar multiplication scale, followed by the window size, the execution time of the original Pippenger, the execution time of our AVX-MSM, and the speedup between the two.
[15, 6, 0.057, 0.019, 3.0]
[15, 7, 0.059, 0.019, 3.1052631578947367]
[15, 8, 0.058, 0.018, 3.2222222222222228]
[15, 9, 0.034, 0.014, 2.428571428571429]
[15, 10, 0.037, 0.017, 2.176470588235294]
[15, 11, 0.038, 0.02, 1.9]
[15, 12, 0.056, 0.027, 2.074074074074074]
[15, 13, 0.05, 0.034, 1.4705882352941175]
Best: [15, 9, 0.034, 0.014, 2.428571428571429] // this is the best window size
AVX-ZK
Building
Generate static link library libmsm.a
.
$ cd AVX-MSM/demo/381
$ make msm
Cmake and create the Makefile:
$ cd AVX-ZK
$ mkdir build && cd build && cmake ..
Copy the libmsm.a
and librelic_s.a
.to AVX-ZK/build/depends/libff/libff.
$ cp ../../AVX-MSM/demo/381/build/libmsm.a ../../AVX-MSM/demo/381/target/lib/librelic_s.a ./depends/libff/libff
Then, to compile the library, run this within the build
directory:
$ make
Using
Run the profiling of AVX-ZK.
$ make profile_r1cs_gg_ppzksnark
$ ./libsnark/profile_r1cs_gg_ppzksnark 65536 8192 bytes
You can also use the Python script to perform batch benching.
$ cd AVX-ZK
$ python bench.py
Output example
The output format of AVX-ZK follows the format of the libsnark
library. Below is an example of the output from the python script:
// 15 means size of 2^15; True means result is correct
// 1.2709s is the execution time of our AVX-ZK
[15, True, '[1.2709s x0.97]\t(19.6462s x1.00 from start)']
Switching between single and multi-core
In file SimdMSM/AVX-ZK/libsnark/zk_proof_systems/ppzksnark/r1cs_gg_ppzksnark/r1cs_gg_ppzksnark.tcc
, the proof generation function is r1cs_gg_ppzksnark_prover
. Specifically, functions containing multi_exp
are responsible for multi-scalar multiplication. You can modify their template parameters to enable multi-threading or not.
```c++
//single-core
multi_exp_method_pip_avx
multi_exp_method_pair_avx
//multi-core
multi_exp_method_pip_avx_threads
multi_exp_method_pair_avx_threads
Specifically, in the proof generation function, replace `multi_exp_method_pip_avx` with `multi_exp_method_pip_avx_threads` in the computation of evaluation_At, evaluation_Ht, and evaluation_Lt. For the computation of evaluation_Bt, replace `multi_exp_method_pair_avx` with `multi_exp_method_pair_avx_threads`. After modifying the code, repeat the above Building and Using steps in the AVX-ZK part.
## Running and Testing AVX-ZK by JsnarkCircuitBuilder
### Building
Return to the main directory `SimdMSM/`. The first part is similar to AVX-ZK.
```shell
$ cd jsnark/libsnark
$ mkdir build && cd build && cmake ..
Copy the libmsm.a
and librelic_s.a
.to libsnark/build/depends/libff/libff. Then build.
$ cp ../../../AVX-MSM/demo/381/build/libmsm.a ../../../AVX-MSM/demo/381/target/lib/librelic_s.a ./depends/libff/libff
$ make
To compile the JsnarkCircuitBuilder project via command line, from the SimdMSM/jsnark
directory:
$ cd jsnark
$ cd JsnarkCircuitBuilder
$ mkdir -p bin
$ javac -d bin -cp /usr/share/java/junit4.jar:bcprov-jdk15on-159.jar $(find ./src/* | grep ".java$")
Using
Run AES.
$ java -cp bin examples.generators.blockciphers.AES128CipherCircuitGenerator
Run SHA-256.
$ java -cp bin examples.generators.hash.SHA2CircuitGenerator
Run RSAEnc.
$ java -cp bin examples.generators.rsa.RSAEncryptionCircuitGenerator
Run Merkle-Tree.
$ java -cp bin examples.generators.hash.MerkleTreeMembershipCircuitGenerator
Run RSASigVer.
$ java -cp bin examples.generators.rsa.RSASigVerCircuitGenerator
Run Auction.
$ java -cp bin examples.generators.augmenter.AugmentedAuctionCircuitGenerator