-rw-r--r-- 4413 libntruprime-20241021/doc/speed.md raw
In the following speed table, smaller keygen/enc/dec numbers are better.
The numbers are interquartile means of single-core cycle counts on various microarchitectures.
Overclocking is disabled.
| μarch | KEM | keypair | enc | dec |
| :---- | :--- | ------: | ---: | ---: |
| Golden Cove (2021) | sntrup653 | 569998 | 31437 | 44933
| | sntrup761 | 743752 | 35130 | 47191
| | sntrup857 | 938327 | 42316 | 61681
| | sntrup953 | 1134045 | 45276 | 63881
| | sntrup1013 | 1260294 | 45882 | 65517
| | sntrup1277 | 1945365 | 58064 | 81528
| Zen 3 (2020) | sntrup653 | 630180 | 32541 | 45606
| | sntrup761 | 841462 | 35311 | 47977
| | sntrup857 | 1038697 | 42592 | 60883
| | sntrup953 | 1253426 | 47028 | 64965
| | sntrup1013 | 1435108 | 46973 | 66213
| | sntrup1277 | 2171656 | 60228 | 80954
| Zen 2 (2019) | sntrup653 | 938965 | 38397 | 60512
| | sntrup761 | 1254327 | 41018 | 63308
| | sntrup857 | 1602767 | 50833 | 82192
| | sntrup953 | 1956955 | 54960 | 86430
| | sntrup1013 | 2203462 | 55829 | 88492
| | sntrup1277 | 3464035 | 70184 | 108778
| Cortex-A72 (2016) | sntrup653 | 9582905 | 685284 | 1149001
| | sntrup761 | 12759145 | 882618 | 1530800
| | sntrup857 | 16122629 | 1077066 | 1924171
| | sntrup953 | 19832706 | 1290673 | 2364008
| | sntrup1013 | 22573875 | 1431952 | 2668071
| | sntrup1277 | 35773394 | 2153211 | 4220444
| Skylake (2015) | sntrup653 | 692941 | 39847 | 59024
| | sntrup761 | 831462 | 41784 | 61793
| | sntrup857 | 1128575 | 51224 | 78029
| | sntrup953 | 1305503 | 54602 | 82755
| | sntrup1013 | 1425447 | 57443 | 88127
| | sntrup1277 | 2202685 | 73622 | 108376
| Haswell (2013) | sntrup653 | 767219 | 44278 | 65223
| | sntrup761 | 939803 | 47008 | 69013
| | sntrup857 | 1271818 | 58349 | 89379
| | sntrup953 | 1488084 | 64012 | 94010
| | sntrup1013 | 1680440 | 65535 | 96216
| | sntrup1277 | 2720370 | 81871 | 122664
Microarchitectures are listed in reverse chronological order of when they were introduced.
In the libntruprime distribution,
`command/ntruprime-speed.c` measures libntruprime;
`benchmarks/*-*` is the output of `ntruprime-speed` on various machines;
and `autogen/md-speed` extracts the table from those measurements.
The table reports only interquartile means of cycle counts, not the full distribution of cycle counts.
See the full output files
for differences between multiple measurements and the interquartile mean.
### <a name="faster">Faster `sntrup` software
There has been extensive further work on `sntrup` software speeds
beyond the current libntruprime speeds.
libntruprime has a policy of [limiting code size](security.html),
but if there are applications that need these speedups
then they can still be considered for inclusion in libntruprime:
* `mult3sntrup761/avx2unsigned` in SUPERCOP
from Ming-Shing Chen:
faster multiplications on Intel/AMD.
* `invsntrup761/jumpdivsteps` in SUPERCOP
from Daniel J. Bernstein, Ming-Shing Chen, Gregor Seiler, and Bo-Yin Yang:
faster inversions on Intel/AMD.
* ["OpenSSLNTRU: Faster post-quantum TLS key exchange"](https://eprint.iacr.org/2021/826)
from Daniel J. Bernstein, Billy Bob Brumley, Ming-Shing Chen, and Nicola Tuveri:
faster inversions for batch operations.
* ["Multi-Parameter Support with NTTs for NTRU and NTRU Prime on Cortex-M4"](https://eprint.iacr.org/2022/930)
from Erdem Alkim, Vincent Hwang, and Bo-Yin Yang:
faster multiplications on 32-bit ARM.
* ["Algorithmic Views of Vectorized Polynomial Multipliers for NTRU and NTRU Prime"](https://eprint.iacr.org/2023/541)
from Han-Ting Chen, Yi-Hua Chung, Vincent Hwang, Chi-Ting Liu, and Bo-Yin Yang:
faster multiplications on 64-bit ARM.
* ["Pushing the Limit of Vectorized Polynomial Multiplication for NTRU Prime"](https://eprint.iacr.org/2023/604)
from Vincent Hwang:
faster multiplications on 64-bit ARM and Intel/AMD.
* ["Algorithmic Views of Vectorized Polynomial Multipliers – NTRU Prime"](https://eprint.iacr.org/2023/1580)
from Vincent Hwang, Chi-Ting Liu, and Bo-Yin Yang:
faster multiplications on 64-bit ARM.
* ["A Survey of Polynomial Multiplications for Lattice-Based Cryptosystems"](https://eprint.iacr.org/2023/1962)
from Vincent Hwang:
faster multiplications on Intel/AMD.
* ["Jumping for Bernstein-Yang Inversion"](https://eprint.iacr.org/2024/644)
from Li-Jie Jian, Ting-Yuan Wang, Bo-Yin Yang, and Ming-Shing Chen:
faster inversions on 64-bit ARM.