Transactions on Cryptographic Hardware and Embedded Systems, Volume 2023
Oil and Vinegar: Modern Parameters and Implementations
README
This repository contains the code accompanying the paper Oil and Vinegar: Modern Parameters and
Implementations which is available here.
This repository contains OV implementations targeting x86 (with AVX2), Armv8 (with Neon), Arm Cortex-M4, and FPGA.
Authors:
- Ward Beullens
- Ming-Shing Chen
- Shih-Hao Hung
- Matthias J. Kannwischer
- Bo-Yuan Peng
- Cheng-Jhih Shih
- Bo-Yin Yang
Warning: This is the version of the code accompanying the paper. This is not the NIST submission! Parameters and implementations may still change for the NIST submission. Official reference code will be posted separately.
Parameters
Parameter | signature size | pk size | sk size | pkc size | compressed-sk size |
---|---|---|---|---|---|
GF(16),160,64 | 96 | 412,160 | 348,704 | 66,576 | 48 |
GF(256),112,44 | 128 | 278,432 | 237,896 | 43,576 | 48 |
GF(256),184,72 | 200 | 1,225,440 | 1,044,320 | 189,232 | 48 |
GF(256),244,96 | 260 | 2,869,440 | 2,436,704 | 446,992 | 48 |
Cortex-M4
This directory contains the implementation targeting the Arm Cortex-M4.
In particular, we target the NUCLEO-L4R5ZI board featuring 2MB of Flash and 640KB of RAM.
Our benchmarking setup is an adapted version of the pqm4 framework.
Cloning
Clone this repository with submodules recursively
# for a fresh clone
git clone --recurse-submodules https://github.com/pqov/pqov-paper
# for an existing clone
git submodule update --init --recursive
Getting started (pqm4)
Follow the steps on https://github.com/mupq/pqm4 for setting up all required software and making sure that communication with the board works correctly.
Make sure that the following command works in pqm4 (not in this repository)
./test.py -p nucleo-l4r5zi -u /dev/ttyACM0 kyber512
It should output
DEBUG:platform interface:Found start pattern: b'=========================\n'
INFO:BoardTestCase:Success
multiple times.
Structure of this repository
Core arithmetic is in the assembly files in m4asm which is then symlinked into the implementation directories.
Implementation directories is available in crypto_sign.
The core differences to the pqm4 framework are
- Added support for writing to flash memory in hal-flash.h and hal-flash.c. This is used for all parameter sets for which the key does not fit in RAM. Implementations are then called m4f-flash
or m4f-flash-speed
.
- Added round-reduced AES for sampling the OV public key. The implementation is based on the implementation by Stoffelen and Schwabe. Our adapted implementation is in aes4-publicinputs.h, aes4-publicinputs.c, and aes4-publicinputs.S.
To get an overview about the core differences between the reference implementation and the optimized implementation, have a look at blas_matrix_m4f.c and ov_publicmap_m4f.c.
Using this repository
This repository works in a similar way as pqm4 via the test.py
, testvectors.py
, and benchmark.py
scripts.
To run functional tests, you can use the test.py
script:
# to test all implementations of all parameter sets
./test.py -p nucleo-l4r5zi -u /dev/ttyACM0
# for testing only a subset of parameter sets, you can pass the parameter set names
# parameter sets are ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4
./test.py -p nucleo-l4r5zi -u /dev/ttyACM0 ov-Ip ov-Is
To ensure that testvectors are matching between the reference implementation and the optimized implementation, you can use the testvectors.py
script:
# to check all implementations of all parameter sets
./testvectors.py -p nucleo-l4r5zi -u /dev/ttyACM0
# for testing only a subset of parameter sets, you can pass the parameter set names
# parameter sets are ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4
./testvectors.py -p nucleo-l4r5zi -u /dev/ttyACM0 ov-Ip ov-Is
For benchmarking, there is the benchmarks.py
script.
It comes with additional arguments:
* -i
: number of iterations for signing and verification (key generation is only run once as it is slower and does not have much runtime variation)
* --nostack
: skip the stack benchmarks
* --nohashing
: skip the hashing benchmarks
* --nosize
: skip the code size benchmarks
* --nospeed
: skip the speed benchmarks
# to check all implementations of all parameter sets
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 -i 10000
# for testing only a subset of parameter sets, you can pass the parameter set names
# parameter sets are ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 -i 10000 ov-Ip ov-Is
To convert the benchmarking results (stored in plaintext in benchmarks/
), you can use the convert_benchmarks.py
script:
# markdown format
./convert_benchmarks.py md
# csv format
./convert_benchmarks.py csv
Experiments for the paper have been done with the following commands:
#!/bin/bash
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 10
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 990
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
./benchmarks.py -p nucleo-l4r5zi -u /dev/ttyACM0 --nostack --nohashing --nosize ov-Is ov-Is-pkc ov-Is-pkc-aes4 ov-Is-pkc-skc ov-Is-pkc-skc-aes4 ov-Ip ov-Ip-pkc ov-Ip-pkc-aes4 ov-Ip-pkc-skc ov-Ip-pkc-skc-aes4 -i 1000
FPGA
Prerequisites
Hardware environment
- We test
SL1-p
andSL1-s
on a AVNET Zedboard Zynq-7000 Development Board, XC7Z020 - We test
SL3
andSL5
on a Digilent Nexys Video Artix-7 FPGA, XC7A200T.
Software
- Python 3.10 with packages
numba pycryptodome pyfinite libconf
- Vivado v2022.1 on Ubuntu 18.04
- RTL simulation tools:
ncverilog
,vcs
oriverilog
Folder structure
├── Makefile // Used for simulation
├── gen_processor.py // Use this python code to config and generate verilog codes
├── onboard // Tickle files to generate the system
├── scripts // Scripts to simulate or sythesize
├── simulator // Python simulator providing behavior simulation
└── test_files // Configuration files
Usage
- RTL simulation (may take a while)
python gen_processor.py test_files/one-round-aes/gauss-10round-aes/SL1-p/classic.cfg // config for 1p classic with 1round AES
./scripts/csh_run_sim // This script includes csh file to use vcs or ncverilog, modify it to the path your computer
- Vivado synthesis
python gen_processor.py test_files/one-round-aes/gauss-10round-aes/SL1-p/classic.cfg
./scripts/run_synthesis
// It will generate a folder "test_files_one-round-aes_gauss-10round-aes_SL1-p_classic_cfg"
// Get the hardware result
vivado -mode tcl -source ./scripts/report.tcl -tclargs "test_files_one-round-aes_gauss-10round-aes_SL1-p_classic_cfg"
- Run simulation and sythesis for all configuration files in a path
python scripts/run_all_simulation.py test_files/one-round-aes/ // Run simulation
python scripts/run_all_synthesis.py test_files/one-round-aes/gauss-10round-aes/ // Run synthesis
./scripts/hardware_result // Report all synthesized results
Others
Python simulator
We also provide python simulator to perform behavioral simulation.
python simulator/codegen.py -n 16 -v 68 -o 44 -g 8 -m classic -r 10 -e // Generate test.data and simulate python simulator/codegen.py -h // See the meaning of parameters
The configuration file
- Specify the variables in this file
Example:
N = 16; V = 68; O = 44; GF_bit = 8; row_layout = [8, 8]; col_layout = [4, 4, 4, 4]; right_delay_every_X_resource_unit = 1; mode = "classic"; aes_round = 10; use_inversion = false; use_tower_field = false; use_pipelined_aes = false; platform = "zedboard";
x86 (AVX2) and Armv8 (Neon)
Contents
- src : Source code.
- utils : utilities for AES, SHAKE, and PRNGs. The default setting calls openssl library.
- unit_tests : unit testers.
Instructions for testing/benchmarks
Type make
make
for generating 3 executables:
1. sign_api-test
: testing for API functions (crypto_keygen()
, crypto_sign()
, and crypto_verify()
).
2. ign_api-benchmark
: reporting performance numbers for signature API functions.
2. rec-sign-benchmark
: reporting more detailed performance numbers for signature API functions. Number format: ''average /numbers of testing (1st quartile, median, 3rd quartile)''
Valgrind test for constant-time and memory leakage
Experiments on checking timing leakage using Valgrind:
make VALGRIND=1 valgrind
It will first mark the secret data as undefined values and then run valgrind to investigating sign_api-test
executable for accessing undefined values.
We have remove some false positive errors. Please use
grep -r "_VALGRIND"
to see all the code involving in the experiments.
The Valgrind experiment applies to other makefile parameters as well. for ex.
make VALGRIND=1 PROJ=avx2 PARAM=4 valgrind
Options for Parameters:
For compiling different parameters, we use the macros ( _OV256_112_44 / _OV256_184_72 / _OV256_244_96 / _OV16_160_64 ) to control the C source code.
The default setting is _OV256_112_44 defined in src/params.h.
The other option is to use our makefile:
1. _OV16_160_64 :
make PARAM=1
- _OV256_112_44 :
make
or
make PARAM=3
- _OV256_184_72 :
make PARAM=4
- _OV256_244_96 :
make PARAM=5
Options for Variants:
For compiling different variants, we use the macros ( _OV_CLASSIC / _OV_PKC / _OV_PKC_SKC ) to control the C source code.
The default setting is _OV_CLASSIC defined in src/params.h.
The other option is to use our makefile:
1. _OV_CLASSIC :
make
or
make VARIANT=1
- _OV_PKC :
make VARIANT=2
- _OV_PKC_SKC :
make VARIANT=3
- _OV256_244_96 and _OV_PKC :
make VARIANT=2 PARAM=5
Optimizations for Architectures:
Reference Version:
The reference uses (1) source code in the directories: src/ , src/ref/, and
(2) directories for utilities of AES, SHAKE, and randombytes() : utils/ .
The default implementation for AES and SHAKE is from openssl library, controlled by the macro UTILS_OPENSSL defined in src/config.h.
Or, use our makefile:
1. Reference version (_OV256_112_44 and _OV_CLASSIC):
make
- Reference version, _OV256_244_96 , and _OV_PKC :
make VARIANT=2 PARAM=5
To turn on the option of 4-round AES, one need to turn on the macro _4ROUND_AES_ defined in src/params.h.
AVX2 Version:
The AVX2 option uses (1) source code in the directories: src/ , src/amd64 , src/ssse3 , src/avx2, and
(2) directories for utilities of AES, SHAKE, and randombytes() : utils/, utils/x86aesni .
(3) One stil need to turn on the macros _BLAS_AVX2_, _MUL_WITH_MULTAB_, _UTILS_AESNI_ defined in src/config.h to enable AVX2 optimization.
Or, use our makefile:
1. AVX2 version (_OV256_112_44 and _OV_CLASSIC):
make PROJ=avx2
- AVX2 version, _OV256_184_72, and _OV_PKC:
make PROJ=avx2 PARAM=4 VARIANT=2
NEON Version:
The NEON option uses (1) source code in the src/ , src/amd64 , src/neon, and
(2) directories for utilities of AES, SHAKE, and randombytes() : utils/, ( utils/neon_aesinst (Armv8 AES instruction) or utils/neon_aes(NEON bitslice AES implemetation) ).
(3) One stil need to turn on the macros _BLAS_NEON_ , _UTILS_NEONAES_ defined in src/config.h to enable NEON optimization.
(4) Depending on the CPUs and parameters, one can choose to define the macro _MUL_WITH_MULTAB_ for GF multiplication with MUL tables. We suggest to turn on it for the _OV16_160_64 parameter.
Or, use our makefile:
1. NEON version (_OV256_112_44 and _OV_CLASSIC):
make PROJ=neon
- Another example: NEON version, _OV16_160_64, and _OV_PKC_SKC:
make PROJ=neon PARAM=1 VARIANT=3
Notes for Apple Mac M1:
1. We use
uname -s
to detect if running on Mac OS.
If uname returns string containing Darwin,
the makefile will define _MAC_OS_ macro for enabling some optimization settings in the source code .
2. The program needs sudo to benchmark on Mac OS correctly.
Options for Algorithm of Solving Linear Equation while Signing:
- Default setting: Gaussian Elimination and backward substitution.
- Choose the algorithm of calculating inversion matrix with block matrix compution:
(a) Define the _LDU_DECOMPOSE_ macro in src/parms.h.
(b) Remove the _BACK_SUBSTITUTION_ macro in src/ov.c.
License
Our implementations of OV are released under the conditions of CC0.
Third party code may have other licenses which is stated at the top of each file or in the respective LICENSE files.