International Association for Cryptologic Research

International Association
for Cryptologic Research

Transactions on Cryptographic Hardware and Embedded Systems, Volume 2023

Oil and Vinegar: Modern Parameters and Implementations


README

This repository contains the code accompanying the paper Oil and Vinegar: Modern Parameters and
Implementations
which is available here.

This repository contains OV implementations targeting x86 (with AVX2), Armv8 (with Neon), Arm Cortex-M4, and FPGA.

Authors:

Warning: This is the version of the code accompanying the paper. This is not the NIST submission! Parameters and implementations may still change for the NIST submission. Official reference code will be posted separately.

Parameters

Parameter signature size pk size sk size pkc size compressed-sk size
GF(16),160,64 96 412,160 348,704 66,576 48
GF(256),112,44 128 278,432 237,896 43,576 48
GF(256),184,72 200 1,225,440 1,044,320 189,232 48
GF(256),244,96 260 2,869,440 2,436,704 446,992 48

Cortex-M4

This directory contains the implementation targeting the Arm Cortex-M4.
In particular, we target the NUCLEO-L4R5ZI board featuring 2MB of Flash and 640KB of RAM.

Our benchmarking setup is an adapted version of the pqm4 framework.

Cloning

Clone this repository with submodules recursively

# for a fresh clone
git clone  --recurse-submodules https://github.com/pqov/pqov-paper

# for an existing clone
git submodule update --init --recursive

Getting started (pqm4)

Follow the steps on https://github.com/mupq/pqm4 for setting up all required software and making sure that communication with the board works correctly.
Make sure that the following command works in pqm4 (not in this repository)

./test.py -p nucleo-l4r5zi -u /dev/ttyACM0 kyber512

It should output

DEBUG:platform interface:Found start pattern: b'=========================\n'
INFO:BoardTestCase:Success

multiple times.

Structure of this repository

Core arithmetic is in the assembly files in m4asm which is then symlinked into the implementation directories.
Implementation directories is available in crypto_sign.

The core differences to the pqm4 framework are
- Added support for writing to flash memory in hal-flash.h and hal-flash.c. This is used for all parameter sets for which the key does not fit in RAM. Implementations are then called m4f-flash or m4f-flash-speed.
- Added round-reduced AES for sampling the OV public key. The implementation is based on the implementation by Stoffelen and Schwabe. Our adapted implementation is in aes4-publicinputs.h, aes4-publicinputs.c, and aes4-publicinputs.S.

To get an overview about the core differences between the reference implementation and the optimized implementation, have a look at blas_matrix_m4f.c and ov_publicmap_m4f.c.

Using this repository

This repository works in a similar way as pqm4 via the test.py, testvectors.py, and benchmark.py scripts.

To run functional tests, you can use the test.py script:

    # to test all implementations of all parameter sets
    ./test.py -p nucleo-l4r5zi -u /dev/ttyACM0

    # for testing only a subset of parameter sets, you can pass the parameter set names
    # parameter sets are ov-Ip  ov-Ip-pkc  ov-Ip-pkc-aes4  ov-Ip-pkc-skc  ov-Ip-pkc-skc-aes4  ov-Is  ov-Is-pkc  ov-Is-pkc-aes4  ov-Is-pkc-skc  ov-Is-pkc-skc-aes4
    ./test.py -p nucleo-l4r5zi -u /dev/ttyACM0 ov-Ip ov-Is

To ensure that testvectors are matching between the reference implementation and the optimized implementation, you can use the testvectors.py script:

    # to check all implementations of all parameter sets
    ./testvectors.py -p nucleo-l4r5zi -u /dev/ttyACM0

    # for testing only a subset of parameter sets, you can pass the parameter set names
    # parameter sets are ov-Ip  ov-Ip-pkc  ov-Ip-pkc-aes4  ov-Ip-pkc-skc  ov-Ip-pkc-skc-aes4  ov-Is  ov-Is-pkc  ov-Is-pkc-aes4  ov-Is-pkc-skc  ov-Is-pkc-skc-aes4
    ./testvectors.py -p nucleo-l4r5zi -u /dev/ttyACM0 ov-Ip ov-Is

For benchmarking, there is the benchmarks.py script.
It comes with additional arguments:
* -i : number of iterations for signing and verification (key generation is only run once as it is slower and does not have much runtime variation)
* --nostack: skip the stack benchmarks
* --nohashing: skip the hashing benchmarks
* --nosize: skip the code size benchmarks
* --nospeed: skip the speed benchmarks

    # to check all implementations of all parameter sets
    ./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 -i 10000

    # for testing only a subset of parameter sets, you can pass the parameter set names
    # parameter sets are ov-Ip  ov-Ip-pkc  ov-Ip-pkc-aes4  ov-Ip-pkc-skc  ov-Ip-pkc-skc-aes4  ov-Is  ov-Is-pkc  ov-Is-pkc-aes4  ov-Is-pkc-skc  ov-Is-pkc-skc-aes4
    ./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 -i 10000 ov-Ip ov-Is

To convert the benchmarking results (stored in plaintext in benchmarks/), you can use the convert_benchmarks.py script:

    # markdown format
    ./convert_benchmarks.py md

    # csv format
    ./convert_benchmarks.py csv

Experiments for the paper have been done with the following commands:

#!/bin/bash

./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 10
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 990
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000

FPGA

Prerequisites

Hardware environment

Software

Folder structure

├── Makefile              // Used for simulation
├── gen_processor.py      // Use this python code to config and generate verilog codes
├── onboard               // Tickle files to generate the system
├── scripts               // Scripts to simulate or sythesize
├── simulator             // Python simulator providing behavior simulation
└── test_files            // Configuration files

Usage

python gen_processor.py test_files/one-round-aes/gauss-10round-aes/SL1-p/classic.cfg  // config for 1p classic with 1round AES
./scripts/csh_run_sim                                                                 // This script includes csh file to use vcs or ncverilog, modify it to the path your computer
python gen_processor.py test_files/one-round-aes/gauss-10round-aes/SL1-p/classic.cfg
./scripts/run_synthesis
// It will generate a folder "test_files_one-round-aes_gauss-10round-aes_SL1-p_classic_cfg"

// Get the hardware result
vivado -mode tcl -source ./scripts/report.tcl -tclargs "test_files_one-round-aes_gauss-10round-aes_SL1-p_classic_cfg"
python scripts/run_all_simulation.py test_files/one-round-aes/                     // Run simulation
python scripts/run_all_synthesis.py test_files/one-round-aes/gauss-10round-aes/    // Run synthesis
./scripts/hardware_result                                                          // Report all synthesized results

Others

Python simulator

The configuration file

x86 (AVX2) and Armv8 (Neon)

Contents

Instructions for testing/benchmarks

Type make

make

for generating 3 executables:
1. sign_api-test: testing for API functions (crypto_keygen(), crypto_sign(), and crypto_verify()).
2. ign_api-benchmark: reporting performance numbers for signature API functions.
2. rec-sign-benchmark: reporting more detailed performance numbers for signature API functions. Number format: ''average /numbers of testing (1st quartile, median, 3rd quartile)''

Valgrind test for constant-time and memory leakage

Experiments on checking timing leakage using Valgrind:

make VALGRIND=1 valgrind

It will first mark the secret data as undefined values and then run valgrind to investigating sign_api-test executable for accessing undefined values.
We have remove some false positive errors. Please use

grep -r "_VALGRIND"

to see all the code involving in the experiments.

The Valgrind experiment applies to other makefile parameters as well. for ex.

make VALGRIND=1 PROJ=avx2 PARAM=4 valgrind

Options for Parameters:

For compiling different parameters, we use the macros ( _OV256_112_44 / _OV256_184_72 / _OV256_244_96 / _OV16_160_64 ) to control the C source code.
The default setting is _OV256_112_44 defined in src/params.h.

The other option is to use our makefile:
1. _OV16_160_64 :

make PARAM=1
  1. _OV256_112_44 :
make

or

make PARAM=3
  1. _OV256_184_72 :
make PARAM=4
  1. _OV256_244_96 :
make PARAM=5

Options for Variants:

For compiling different variants, we use the macros ( _OV_CLASSIC / _OV_PKC / _OV_PKC_SKC ) to control the C source code.
The default setting is _OV_CLASSIC defined in src/params.h.

The other option is to use our makefile:
1. _OV_CLASSIC :

make

or

make VARIANT=1
  1. _OV_PKC :
make VARIANT=2
  1. _OV_PKC_SKC :
make VARIANT=3
  1. _OV256_244_96 and _OV_PKC :
make VARIANT=2 PARAM=5

Optimizations for Architectures:

Reference Version:

The reference uses (1) source code in the directories: src/ , src/ref/, and
(2) directories for utilities of AES, SHAKE, and randombytes() : utils/ .
The default implementation for AES and SHAKE is from openssl library, controlled by the macro UTILS_OPENSSL defined in src/config.h.

Or, use our makefile:
1. Reference version (_OV256_112_44 and _OV_CLASSIC):

make
  1. Reference version, _OV256_244_96 , and _OV_PKC :
make VARIANT=2 PARAM=5

To turn on the option of 4-round AES, one need to turn on the macro _4ROUND_AES_ defined in src/params.h.

AVX2 Version:

The AVX2 option uses (1) source code in the directories: src/ , src/amd64 , src/ssse3 , src/avx2, and
(2) directories for utilities of AES, SHAKE, and randombytes() : utils/, utils/x86aesni .
(3) One stil need to turn on the macros _BLAS_AVX2_, _MUL_WITH_MULTAB_, _UTILS_AESNI_ defined in src/config.h to enable AVX2 optimization.

Or, use our makefile:
1. AVX2 version (_OV256_112_44 and _OV_CLASSIC):

make PROJ=avx2
  1. AVX2 version, _OV256_184_72, and _OV_PKC:
make PROJ=avx2 PARAM=4 VARIANT=2

NEON Version:

The NEON option uses (1) source code in the src/ , src/amd64 , src/neon, and
(2) directories for utilities of AES, SHAKE, and randombytes() : utils/, ( utils/neon_aesinst (Armv8 AES instruction) or utils/neon_aes(NEON bitslice AES implemetation) ).
(3) One stil need to turn on the macros _BLAS_NEON_ , _UTILS_NEONAES_ defined in src/config.h to enable NEON optimization.
(4) Depending on the CPUs and parameters, one can choose to define the macro _MUL_WITH_MULTAB_ for GF multiplication with MUL tables. We suggest to turn on it for the _OV16_160_64 parameter.

Or, use our makefile:
1. NEON version (_OV256_112_44 and _OV_CLASSIC):

make PROJ=neon
  1. Another example: NEON version, _OV16_160_64, and _OV_PKC_SKC:
make PROJ=neon PARAM=1 VARIANT=3

Notes for Apple Mac M1:
1. We use

uname -s

to detect if running on Mac OS.
If uname returns string containing Darwin,
the makefile will define _MAC_OS_ macro for enabling some optimization settings in the source code .
2. The program needs sudo to benchmark on Mac OS correctly.

Options for Algorithm of Solving Linear Equation while Signing:

  1. Default setting: Gaussian Elimination and backward substitution.
  2. Choose the algorithm of calculating inversion matrix with block matrix compution:
    (a) Define the _LDU_DECOMPOSE_ macro in src/parms.h.
    (b) Remove the _BACK_SUBSTITUTION_ macro in src/ov.c.

License

Our implementations of OV are released under the conditions of CC0.
Third party code may have other licenses which is stated at the top of each file or in the respective LICENSE files.