International Association for Cryptologic Research

International Association
for Cryptologic Research

Transactions on Cryptographic Hardware and Embedded Systems, Volume 2021

Fixslicing AES-like ciphers:

New bitsliced AES speed records on ARM-Cortex M and RISC-V


Artifact for the paper "Fixslicing AES-like Ciphers" published in TCHES 2021/1


Main motivation

This artifact aims at giving details on how to reproduce our benchmarking results reported in the paper Fixslicing AES-like Ciphers on the following development boards:

More specifically, we provide the following information:


The artifact consists in two folders as described below:

│   LICENSE   
│   ├───1storder_masking
│   ├───common
│   ├───barrel_shiftrows
│   └───fixslicing
│   ├───barrel_shiftrows
│   ├───fixslicing
│   └───

where benchmark_arm and benchmark_riscv contain the necessary material to run the benchmark on the STM32 and HiFive1 Rev B boards, respectively. The source code for the AES implementations comes from the github repository published alongside the paper. Because it includes the non-unrolled implementations only, it is left to the reader to unroll them to get the corresponding performance. Nevertheless, we provide the file benchmark_arm/fixslicing/aes_encrypt_unroll.s to give some insights on how it can be done, and one has simply to replace aes_encrypt.s by aes_encrypt_unroll.s in the makefile to benchmark the unrolled fixsliced encryption functions.

Benchmark on STM32 boards


Hereafter we describe our setup to compile, load and run our code on the STM32 development boards above mentioned. We used a laptop running Ubuntu (18.08) with the following tools:

Note that the common folder contains linker scripts and wrappers for the above-mentioned development boards, as well as a simple python script to read the benchmark output.

:warning: Depending on where arm-none-eabi and libopencm3 are installed, you might need to adapt the following lines within makefiles:

OPENCM3DIR = ../libopencm3
ARMNONEEABIDIR = /usr/arm-none-eabi

After running make in the folder to benchmark, one can use stlink to program the boards by running st-flash write aes_m3.bin 0x8000000 for the STM32L100C or st-flash write aes_m4.bin 0x8000000 for the STM32F407VG (note that the 1storder_masking implementations are only compiled for the STM32F407VG board since STM32L100C does not embed a random number generator).

Finally, to execute the benchmark, one has to run python3 ../common/ and reset the board. Note that an USB to TTL adapter is required to communicate with the boards, with TX and RX connected to PA3 and PA2, respectively.

Benchmarking methodology

Our benchmark simply consists in measuring the execution time of a single function call. The execution time is measured by reading the DWT Cycle Counter (DWT_CYCCNT) register after and before the function call. Regarding the code size, it can be measured manually by disassembling the .elf file by running arm-none-eabi-objdump -d aes_m3.elf > code_size.txt in order to inspect the disassembly output. A python script is also provided to do it automatically: simply run python3 code_size.txt in order to print the code size of the different functions listed in the main section.

Results interpretation

For the key schedule functions, the number of cycles printed in the console should match the numbers reported in the paper. For the encryption function, the number of cycles has to be divided by 2 and 8 for the fixsliced and barrel-shiftrows representations, respectively (since 2 and 8 blocks are processed in parallel, respectively, whereas the paper reports the results in cycles per block).

Benchmark on the HiFive1 Rev B board


Regarding the E31 RISC-V core, we used the SiFive Freedom E SDK (v20.05.00.00).

Once the SDK is set up correctly, one can simply copy and paste the files benchmark_riscv/fixslicing/* to freedom-e-sdk/software/fixslicing/* and run make BSP=metal PROGRAM=fixslicing TARGET=sifive-hifive1-revb clean software upload from the freedom-e-sdk folder.

In order to display the output, one has to open another terminal and run the script

Benchmarking methodology

The benchmarking methodology is the same as the one described for the STM32 boards: we read the cycle counter before and after a single function call. The routine to read the counter is written in RV32I assembly in getcycles.S.

Because the E31 RISC-V core embeds a branch predictor that can introduces penalty cycles in case of a wrong guess, note that we execute a function several times before benchmarking it in order to fill the instruction cache and train the branch predictor, so that such penalties are avoided.

For the code size, one can follow the same methodology described for the STM32 boards: first run riscv64-unknown-elf-objdump -d fixslicing.elf > code_size.txt and either inspect the disassembly output manually or simply use the script by running python3 code_size.txt.

Results interpretation

Same remarks as for the STM32 boards.