International Association for Cryptologic Research

International Association
for Cryptologic Research

Transactions on Cryptographic Hardware and Embedded Systems 2025

SimdMSM:

SIMD-accelerated Multi-Scalar Multiplication Framework for zkSNARKs


README

SimdMSM

This source code is an efficient implemetation of MSM and zkSNARK using AVX-512IFMA. It is the artifact of the paper SimdMSM: SIMD-accelerated Multi-Scalar Multiplication Framework for zkSNARKs accepted to TCHES 2025.

Overview

There are three subfolders included in this repository:
- AVX-MSM : the MSM implementation instantiated with AVX512-IFMA engine based on the RELIC library. The specific implementation code can be found in the AVX-MSM/demo/381/ directory. The AVX512-IFMA engine implementation is based on Cheng et al.’s work.
- AVX-ZK : integrating AVX-MSM implementation into the libsnark library. The part of r1cs_gg_ppzksnark, commonly known as the famous Groth16 protocol, is changed to using new AVX-MSM.
- jsnark : a tool for evaluating the performance of AVX-ZK under different real-world workloads.

Requirement

For AVX-MSM

For AVX-ZK

$ sudo apt-get install build-essential cmake git libgmp3-dev libprocps3-dev python-markdown libboost-all-dev libssl-dev
  ```
### For jsnark

You can consult instructions in [jsnark](https://github.com/akosba/jsnark).
- JDK 8 (Higher versions are also expected to work)
- Junit 4
- BouncyCastle library

## Build instructions

## AVX-MSM

### Building

 Target the `SimdMSM` library.

```shell
$ cd  AVX-MSM
$ cd demo/381/
$ make lib

If you encounter the error ../../../preset/x64-pbc-bls12-381.sh: not found, try the following two commands:

$ chmod +x ../../preset/x64-pbc-bls12-381.sh
$ sed -i 's/\r$//' ../../preset/x64-pbc-bls12-381.sh

Using

Run AVX-MSM. The benchmark's data size WNUM and window size WMBITS can be modified in the file /test/test_pip_ifma.c.

$ mkdir build
$ make ifma
$ ./build/test_pip_ifma

Run AVX-pair-MSM. The benchmark's data size WNUM and window size WMBITS can be modified in the file /test/test_pair_ifma.c.

$ make pair_ifma
$ ./build/test_pair_ifma

Run AVX-MSM(muti-threads). The benchmark's data size WNUM and window size WMBITS can be modified in the file /test/test_pip_threads.c.

$ make thread
$ ./build/test_pip_threads

Run AVX-pair-MSM(muti-threads). The benchmark's data size WNUM and window size WMBITS can be modified in the file /test/test_pair_threads.c.

$ make pair_thread
$ ./build/test_pair_threads

You can also use the Python script to perform batch benching.

$ mkdir build
$ python bench.py

Output example

The output structure of AVX-MSM, AVX-pair-MSM, AVX-MSM (multi-threads), and AVX-pair-MSM (multi-threads) is generally similar. Here, I'll use AVX-MSM as an example to describe its output structure.

The three macros WNUM, WMBITS, and NBENCHS in the test file represent the multi-scalar multiplication scale, window size, and number of benchmark iterations, respectively.

Pippenger_old=0.790256  // the execution time of the original Pippenger 
Pippenger_ifma=0.325606 // the execution time of our AVX-MSM (in seconds)
YES  // the computation result is correct

The output of bench.py is as follows: the first column represents the multi-scalar multiplication scale, followed by the window size, the execution time of the original Pippenger, the execution time of our AVX-MSM, and the speedup between the two.

[15, 6, 0.057, 0.019, 3.0]
[15, 7, 0.059, 0.019, 3.1052631578947367]
[15, 8, 0.058, 0.018, 3.2222222222222228]
[15, 9, 0.034, 0.014, 2.428571428571429]
[15, 10, 0.037, 0.017, 2.176470588235294]
[15, 11, 0.038, 0.02, 1.9]
[15, 12, 0.056, 0.027, 2.074074074074074]
[15, 13, 0.05, 0.034, 1.4705882352941175]
Best: [15, 9, 0.034, 0.014, 2.428571428571429] // this is the best window size

AVX-ZK

Building

Generate static link library libmsm.a.

$ cd AVX-MSM/demo/381
$ make msm

Cmake and create the Makefile:

$ cd AVX-ZK
$ mkdir build && cd build && cmake ..

Copy the libmsm.a and librelic_s.a.to AVX-ZK/build/depends/libff/libff.

$ cp ../../AVX-MSM/demo/381/build/libmsm.a ../../AVX-MSM/demo/381/target/lib/librelic_s.a ./depends/libff/libff

Then, to compile the library, run this within the build directory:

$ make

Using

Run the profiling of AVX-ZK.

$ make profile_r1cs_gg_ppzksnark
$ ./libsnark/profile_r1cs_gg_ppzksnark 65536 8192 bytes

You can also use the Python script to perform batch benching.

$ cd AVX-ZK
$ python bench.py

Output example

The output format of AVX-ZK follows the format of the libsnark library. Below is an example of the output from the python script:

// 15 means size of 2^15; True means result is correct
// 1.2709s is the execution time of our AVX-ZK
[15, True, '[1.2709s x0.97]\t(19.6462s x1.00 from start)']

Switching between single and multi-core

In file SimdMSM/AVX-ZK/libsnark/zk_proof_systems/ppzksnark/r1cs_gg_ppzksnark/r1cs_gg_ppzksnark.tcc, the proof generation function is r1cs_gg_ppzksnark_prover. Specifically, functions containing multi_exp are responsible for multi-scalar multiplication. You can modify their template parameters to enable multi-threading or not.
```c++
//single-core
multi_exp_method_pip_avx
multi_exp_method_pair_avx
//multi-core
multi_exp_method_pip_avx_threads
multi_exp_method_pair_avx_threads

Specifically, in the proof generation function, replace `multi_exp_method_pip_avx` with `multi_exp_method_pip_avx_threads` in the computation of evaluation_At, evaluation_Ht, and evaluation_Lt. For the computation of evaluation_Bt, replace `multi_exp_method_pair_avx` with `multi_exp_method_pair_avx_threads`. After modifying the code, repeat the above Building and Using steps in the AVX-ZK part.

## Running and Testing AVX-ZK by JsnarkCircuitBuilder

### Building
Return to the main directory `SimdMSM/`. The first part is similar to AVX-ZK.
```shell
$ cd jsnark/libsnark
$ mkdir build && cd build && cmake ..

Copy the libmsm.a and librelic_s.a.to libsnark/build/depends/libff/libff. Then build.

$ cp ../../../AVX-MSM/demo/381/build/libmsm.a ../../../AVX-MSM/demo/381/target/lib/librelic_s.a ./depends/libff/libff
$ make

To compile the JsnarkCircuitBuilder project via command line, from the SimdMSM/jsnark directory:

$ cd jsnark
$ cd JsnarkCircuitBuilder
$ mkdir -p bin
$ javac -d bin -cp /usr/share/java/junit4.jar:bcprov-jdk15on-159.jar  $(find ./src/* | grep ".java$")

Using

Run AES.

$ java -cp bin examples.generators.blockciphers.AES128CipherCircuitGenerator

Run SHA-256.

$ java -cp bin examples.generators.hash.SHA2CircuitGenerator

Run RSAEnc.

$ java -cp bin examples.generators.rsa.RSAEncryptionCircuitGenerator

Run Merkle-Tree.

$ java -cp bin examples.generators.hash.MerkleTreeMembershipCircuitGenerator

Run RSASigVer.

$ java -cp bin examples.generators.rsa.RSASigVerCircuitGenerator

Run Auction.

$ java -cp bin examples.generators.augmenter.AugmentedAuctionCircuitGenerator