Transactions on Cryptographic Hardware and Embedded Systems, Volume 2021
Fixslicing AES-like ciphers:
New bitsliced AES speed records on ARM-Cortex M and RISC-V
Artifact for the paper "Fixslicing AES-like Ciphers" published in TCHES 2021/1
This artifact aims at giving details on how to reproduce our benchmarking results reported in the paper Fixslicing AES-like Ciphers on the following development boards:
More specifically, we provide the following information:
- our development environment (on Ubuntu 18.08)
- our benchmark methodology
- results interpretation.
The artifact consists in two folders as described below:
artifact_fixslicing │ README.md │ LICENSE │ ├───benchmark_arm │ ├───1storder_masking │ ├───common │ ├───barrel_shiftrows │ └───fixslicing │ ├───benchmark_riscv │ ├───barrel_shiftrows │ ├───fixslicing │ └───screen.sh
benchmark_riscv contain the necessary material to run the benchmark on the STM32 and HiFive1 Rev B boards, respectively.
The source code for the AES implementations comes from the github repository published alongside the paper. Because it includes the non-unrolled implementations only, it is left to the reader to unroll them to get the corresponding performance. Nevertheless, we provide the file
benchmark_arm/fixslicing/aes_encrypt_unroll.s to give some insights on how it can be done, and one has simply to replace
aes_encrypt_unroll.s in the makefile to benchmark the unrolled fixsliced encryption functions.
Benchmark on STM32 boards
Hereafter we describe our setup to compile, load and run our code on the STM32 development boards above mentioned. We used a laptop running Ubuntu (18.08) with the following tools:
- GNU Arm Embedded Toolchain (arm-none-eabi gcc v9.2.1) to compile
- libopencm3 open-source firmware library (we installed it from the github repository, commit 946c1cbc48f58e56e5f1d3b65d91c7fd2b94140e) for ARM Cortex-M microcontrollers, to be used with arm-none-eabi
- STLink open source toolset (v1.6.1-98-gd819a4a) to program the boards
- pySerial Python module (v3.5) for serial communications to \dev\ttyUSB0
Note that the
common folder contains linker scripts and wrappers for the above-mentioned development boards, as well as a simple python script
bench.py to read the benchmark output.
:warning: Depending on where arm-none-eabi and libopencm3 are installed, you might need to adapt the following lines within makefiles:
OPENCM3DIR = ../libopencm3 ARMNONEEABIDIR = /usr/arm-none-eabi
make in the folder to benchmark, one can use
stlink to program the boards by running
st-flash write aes_m3.bin 0x8000000 for the STM32L100C or
st-flash write aes_m4.bin 0x8000000 for the STM32F407VG (note that the
1storder_masking implementations are only compiled for the STM32F407VG board since STM32L100C does not embed a random number generator).
Finally, to execute the benchmark, one has to run
python3 ../common/bench.py and reset the board. Note that an USB to TTL adapter is required to communicate with the boards, with TX and RX connected to PA3 and PA2, respectively.
Our benchmark simply consists in measuring the execution time of a single function call.
The execution time is measured by reading the DWT Cycle Counter (
DWT_CYCCNT) register after and before the function call.
Regarding the code size, it can be measured manually by disassembling the
.elf file by running
arm-none-eabi-objdump -d aes_m3.elf > code_size.txt in order to inspect the disassembly output.
A python script
parse_arm-none-eabi-objdump.py is also provided to do it automatically: simply run
python3 parse_arm-none-eabi-objdump.py code_size.txt in order to print the code size of the different functions listed in the main section.
For the key schedule functions, the number of cycles printed in the console should match the numbers reported in the paper. For the encryption function, the number of cycles has to be divided by 2 and 8 for the fixsliced and barrel-shiftrows representations, respectively (since 2 and 8 blocks are processed in parallel, respectively, whereas the paper reports the results in cycles per block).
Benchmark on the HiFive1 Rev B board
Regarding the E31 RISC-V core, we used the SiFive Freedom E SDK (v20.05.00.00).
Once the SDK is set up correctly, one can simply copy and paste the files
freedom-e-sdk/software/fixslicing/* and run
make BSP=metal PROGRAM=fixslicing TARGET=sifive-hifive1-revb clean software upload from the
In order to display the output, one has to open another terminal and run the script
The benchmarking methodology is the same as the one described for the STM32 boards: we read the cycle counter before and after a single function call. The routine to read the counter is written in RV32I assembly in
Because the E31 RISC-V core embeds a branch predictor that can introduces penalty cycles in case of a wrong guess, note that we execute a function several times before benchmarking it in order to fill the instruction cache and train the branch predictor, so that such penalties are avoided.
For the code size, one can follow the same methodology described for the STM32 boards:
riscv64-unknown-elf-objdump -d fixslicing.elf > code_size.txt and either inspect the disassembly output manually or simply use the script by running
python3 parse_riscv64-unknown-elf-objdump.py code_size.txt.
Same remarks as for the STM32 boards.