International Association for Cryptologic Research

International Association
for Cryptologic Research

Transactions on Cryptographic Hardware and Embedded Systems, Volume 2021

CTIDH: faster constant-time CSIDH


Prerequisites: Intel or AMD CPU with adcx/adox: i.e., Broadwell, Skylake, or newer. Linux with standard development tools plus clang plus valgrind.

To download and unpack the latest version:

wget -m
wget -m$version.tar.gz
tar -xzf$version.tar.gz
cd high-ctidh-$version

To compile, test for functionality, tune for multiplications, and tune for cycles, for all selected sizes (511, 512, 1024, 2048):


This takes a while because of all the testing and tuning. Any test failure will stop the build process. You can separately run

make generic # size-independent tests, 8.6 minutes
make 511 # size 511, 1.5 minutes
make 512 # size 512, 1.7 minutes
make 1024 # size 1024, 25 minutes
make 2048 # size 2048, 549 minutes

where the timings shown here are on a 3GHz Skylake core.

(Tuning for multiplications is machine-independent and can be precomputed. Tuning for cycles can be precomputed per microarchitecture. One can carry out both precomputations more efficiently by starting with measurements of tree1, multiprod2, multiprod2_selfreciprocal, multieval_precompute, and multieval_postcompute; the Python scripts include a preliminary implementation of this for the multiplication tuning, currently used only as a double-check.)

The functionality testing included in "make" does not include a constant-time test. To run a constant-time test for all selected sizes:

make timecop # 25 minutes

For benchmarks regarding, e.g., size-511 code tuned for multiplications:

./bench511mults 16383 > bench511mults.out.16383

This runs a million experiments: more precisely, 16383 experiments for each of 65 keys. This takes hours, and generates hundreds of megabytes of data. Each measurement includes, for validation and separately for the action, a "mulsq" count that includes both multiplications and squarings, a "sq" count that includes only squarings, an "addsub" count that includes additions and subtractions, and a cycle count (which for multiplication-tuned code isn't far behind cycle-tuned code). The action also shows "stattried" counts showing the number of times each batch occurred publicly in an atomic block.

To analyze average costs and standard deviations:

./analyze-costs < bench511mults.out.16383 \
> bench511mults.out.16383.analyze-costs

Statistics are printed for each of the 65 keys separately, and ("total") for the all of the experiments together.

To analyze whether the "stattried" counts are as expected:

./analyze-pr < bench511mults.out.16383 \
> bench511mults.out.16383.analyze-pr

This prints, for each batch, 1−1/p times the number of times the batch was tried divided by the batch bound, where p is the smallest prime in the batch.

For various size-511 microbenchmarks:


To select other CSIDH sizes and other CTIDH parameters (subject to various undocumented restrictions), edit the table at the top of autogen and run "./autogen; make".