International Association for Cryptologic Research

International Association
for Cryptologic Research

Transactions on Cryptographic Hardware and Embedded Systems, Volume 2021

NTT Multiplication for NTT-unfriendly Rings:

New Speed Records for Saber and NTRU on Cortex-M4 and AVX2


Chi-Ming Marvin Chung
Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan

Vincent Hwang
Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan

Matthias J. Kannwischer
Max Planck Institute for Security and Privacy, Bochum, Germany

Gregor Seiler
IBM Research – Zurich, Rüschlikon, Switzerland; ETH Zurich, Zurich, Switzerland

Cheng-Jhih Shih
Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan

Bo-Yin Yang
Academia Sinica, Taipei, Taiwan


Keywords: Polynomial Multiplication, NTT Multiplication, Saber, NTRU, Cortex-M4, AVX2


Abstract

In this paper, we show how multiplication for polynomial rings used in the NIST PQC finalists Saber and NTRU can be efficiently implemented using the Number-theoretic transform (NTT). We obtain superior performance compared to the previous state of the art implementations using Toom–Cook multiplication on both NIST’s primary software optimization targets AVX2 and Cortex-M4. Interestingly, these two platforms require different approaches: On the Cortex-M4, we use 32-bit NTT-based polynomial multiplication, while on Intel we use two 16-bit NTT-based polynomial multiplications and combine the products using the Chinese Remainder Theorem (CRT).

For Saber, the performance gain is particularly pronounced. On Cortex-M4, the Saber NTT-based matrix-vector multiplication is 61% faster than the Toom–Cook multiplication resulting in 22% fewer cycles for Saber encapsulation. For NTRU, the speed-up is less impressive, but still NTT-based multiplication performs better than Toom–Cook for all parameter sets on Cortex-M4. The NTT-based polynomial multiplication for NTRU-HRSS is 10% faster than Toom–Cook which results in a 6% cost reduction for encapsulation. On AVX2, we obtain speed-ups for three out of four NTRU parameter sets.

As a further illustration, we also include code for AVX2 and Cortex-M4 for the Chinese Association for Cryptologic Research competition award winner LAC (also a NIST round 2 candidate) which outperforms existing code.

Publication

Transactions of Cryptographic Hardware and Embedded Systems, Volume 2021, Issue 2

Paper

Artifact

Artifact number
tches/2021/a7

Artifact published
May 28, 2021

Award
Best Artifact Award for CHES 2021

README

tar.gz (1.6 MB)  

View on Github

License
CC0 To the extent possible under law, the author(s) have waived all copyright and related or neighboring rights to this artifact.

Some files in this archive are licensed under a different license. See the contents of this archive for more information.


BibTeX How to cite

Chung, C.-M. M., Hwang, V., Kannwischer, M. J., Seiler, G., Shih, C.-J., & Yang, B.-Y. (2021). NTT Multiplication for NTT-unfriendly Rings: New Speed Records for Saber and NTRU on Cortex-M4 and AVX2. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021(2), 159-188. https://doi.org/10.46586/tches.v2021.i2.159-188. Artifact at https://artifacts.iacr.org/tches/2021/a7.