Transactions on Cryptographic Hardware and Embedded Systems, Volume 2021
NTT Multiplication for NTT-unfriendly Rings:
New Speed Records for Saber and NTRU on Cortex-M4 and AVX2
Chi-Ming Marvin Chung
Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan
Vincent Hwang
Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan
Matthias J. Kannwischer
Max Planck Institute for Security and Privacy, Bochum, Germany
Gregor Seiler
IBM Research – Zurich, Rüschlikon, Switzerland; ETH Zurich, Zurich, Switzerland
Cheng-Jhih Shih
Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan
Bo-Yin Yang
Academia Sinica, Taipei, Taiwan
Keywords: Polynomial Multiplication, NTT Multiplication, Saber, NTRU, Cortex-M4, AVX2
Abstract
In this paper, we show how multiplication for polynomial rings used in the NIST PQC finalists Saber and NTRU can be efficiently implemented using the Number-theoretic transform (NTT). We obtain superior performance compared to the previous state of the art implementations using Toom–Cook multiplication on both NIST’s primary software optimization targets AVX2 and Cortex-M4. Interestingly, these two platforms require different approaches: On the Cortex-M4, we use 32-bit NTT-based polynomial multiplication, while on Intel we use two 16-bit NTT-based polynomial multiplications and combine the products using the Chinese Remainder Theorem (CRT).
For Saber, the performance gain is particularly pronounced. On Cortex-M4, the Saber NTT-based matrix-vector multiplication is 61% faster than the Toom–Cook multiplication resulting in 22% fewer cycles for Saber encapsulation. For NTRU, the speed-up is less impressive, but still NTT-based multiplication performs better than Toom–Cook for all parameter sets on Cortex-M4. The NTT-based polynomial multiplication for NTRU-HRSS is 10% faster than Toom–Cook which results in a 6% cost reduction for encapsulation. On AVX2, we obtain speed-ups for three out of four NTRU parameter sets.
As a further illustration, we also include code for AVX2 and Cortex-M4 for the Chinese Association for Cryptologic Research competition award winner LAC (also a NIST round 2 candidate) which outperforms existing code.
Publication
Transactions of Cryptographic Hardware and Embedded Systems, Volume 2021, Issue 2
PaperArtifact
Artifact number
tches/2021/a7
Artifact published
May 28, 2021
Award
Best Artifact Award for CHES 2021
License
To the extent possible under law, the author(s) have waived all copyright and related or neighboring rights to this artifact.
Some files in this archive are licensed under a different license. See the contents of this archive for more information.
BibTeX How to cite
Chung, C.-M. M., Hwang, V., Kannwischer, M. J., Seiler, G., Shih, C.-J., & Yang, B.-Y. (2021). NTT Multiplication for NTT-unfriendly Rings: New Speed Records for Saber and NTRU on Cortex-M4 and AVX2. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021(2), 159-188. https://doi.org/10.46586/tches.v2021.i2.159-188. Artifact at https://artifacts.iacr.org/tches/2021/a7.