International Association for Cryptologic Research

International Association
for Cryptologic Research

Transactions on Cryptographic Hardware and Embedded Systems, Volume 2025

Multiplying Polynomials without Powerful Multiplication Instructions


Vincent Hwang
Max Planck Institute for Security and Privacy (MPI-SP), Bochum, Germany

YoungBeom Kim
Kookmin University, Seoul, Korea

Seog Chung Seo
Kookmin University, Seoul, Korea


Keywords: Lattice-based cryptography, Dilithium, Saber, Barrett multiplication, Microcontroller, Nussbaumer FFT, Toom–Cook


Abstract

We improve the performance of lattice-based cryptosystems Dilithium on Cortex-M3 with expensive multiplications. Our contribution is two-fold: (i) We generalize Barrett multiplication and show that the resulting shape-independent modular multiplication performs comparably to long multiplication on some platforms without special hardware when precomputation is free. We call a modular multiplication “shape-independent” if its correctness and efficiency depend only on the magnitude of moduli and not the shapes of the moduli. This was unknown in the literature even though modular multiplication has been studied for more than 40 years. In the literature, shape-independent modular multiplications often perform several times slower than long multiplications even if we ignore the cost of the precomputation. (ii) We show that polynomial multiplications based on Nussbaumer fast Fourier transform and Toom–Cook over $\mathbb{Z}_{2^k}$ perform the best when modular multiplications are expensive and $k$ is not very close to the arithmetic precision. For practical evaluation, we implement assembly programs for the polynomial arithmetic used in the digital signature Dilithium on Cortex-M3. For the modular multiplications in Dilithium, our generalized Barrett multiplications are 1.92 times faster than the state-of-the-art assembly-optimized Montgomery multiplications, leading to 1.38−1.51 times faster Dilithium NTT/iNTT. Along with the improvement in accumulating products, the core polynomial arithmetic matrix-vector multiplications are 1.71−1.77 times faster. We further apply the FFT-based polynomial multiplications over $\mathbb{Z}_{2^k}$ to the challenge polynomial multiplication $c t_0$, leading to 1.31 times faster computation for $c t_0$. We additionally apply the ideas to Saber on Cortex-M3 and demonstrate their improvement to Dilithium and Saber on our 8-bit AVR environment. For Saber on Cortex-M3, we show that matrix-vector multiplications with FFT-based polynomial multiplications over $\mathbb{Z}_{2^k}$ are 1.42−1.46 faster than the ones with NTT-based polynomial multiplications over NTT-friendly coefficient rings. When moving to a platform with smaller arithmetic precision, such as 8-bit AVR, we improve the matrix-vector multiplication of Dilithium with our Barrett-based NTT/iNTT by a factor of 1.87−1.89. As for Saber on our 8-bit AVR environment, we show that matrix-vector multiplications with NTT-based polynomial multiplications over NTT-friendly coefficient rings are faster than polynomial multiplications over $\mathbb{Z}_{2^k}$ due to the large $k$ in Saber.

Publication

Transactions of Cryptographic Hardware and Embedded Systems, Volume 2025, Issue 1

Paper

Artifact

Artifact number
tches/2025/a1

Artifact published
March 6, 2025

Badge
IACR CHES Artifacts Functional

README

ZIP (2772681 Bytes)  

View on Github

License
CC0 To the extent possible under law, the author(s) have waived all copyright and related or neighboring rights to this artifact.

Some files in this archive are licensed under a different license. See the contents of this archive for more information.


BibTeX How to cite

Hwang, V., Kim, Y., & Seo, S. C. (2024). Multiplying Polynomials without Powerful Multiplication Instructions. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2025(1), 160-202. https://doi.org/10.46586/tches.v2025.i1.160-202. Artifact available at https://artifacts.iacr.org/tches/2025/a1