cryptoint 20250414 D. J. Bernstein ## Introduction cryptoint is an almost-header-only library providing functions for comparisons, bit extractions, etc. on `{int,uint}{8,16,32,64}`, while trying to protect against compilers introducing timing variations. Advantages over previous library functions for basic constant-time operations: * cryptoint provides more functions. * cryptoint provides better protection against current compiler "optimizations". (This does not mean the protection is a guarantee. Make sure to apply further tests such as TIMECOP to the compiled code.) * All cryptoint functions, after compilation with some common compilers for some common architectures, have been verified using [saferewrite](https://pqsrc.cr.yp.to/downloads.html) to match reference implementations for all inputs. Often applications have their own ad-hoc constant-time code. Advantages of rewriting those to use the centralized cryptoint functions: * More likely to generate code that's [actually constant-time](https://blog.cr.yp.to/20240803-clang.html). * Less code to fix in response to whatever further damage is caused by compiler "optimizations". * Often less CPU time, although it's rare for this to matter. * Better testing. cryptoint is used inside [SUPERCOP](https://bench.cr.yp.to/supercop.html), [lib25519](https://lib25519.cr.yp.to), [libmceliece](https://lib.mceliece.org), [libntruprime](https://libntruprime.cr.yp.to), and [OpenSSH](https://github.com/openssh/openssh-portable/blob/master/sntrup761.c). ## Usage To use (e.g.) `crypto_int64` in your own package, simply copy `crypto_int64.h` and `int64_optblocker.c` into that package. Compilation recommendations: * Use `gcc` or `clang`. (Porting to other compilers should be a simple matter of compiling with `-D__attribute__(x)=`; however, tests have been carried out only with `gcc` and `clang`.) * Compile all code with `-fwrapv`. (This disables some compiler "optimizations" that often trigger bugs in integer arithmetic. These "optimizations" have very little effect on performance.) * Compile `*optblocker.c` separately: don't manually merge `*optblocker.c` into other files; don't use the `-flto` option in compiling `*optblocker.c`. `crypto_{int,uint}64.h` define types `crypto_{int,uint}64` and the following API functions: | usage | meaning | | ----- | ----- | | `z = crypto_{int,uint}64_load(ptr)` | little-endian load | | `crypto_{int,uint}64_store(ptr,z)` | little-endian store | | `z = crypto_{int,uint}64_load_bigendian(ptr)` | big-endian load | | `crypto_{int,uint}64_store_bigendian(ptr,z)` | big-endian store | | `z = crypto_int64_positive_mask(x)` | `z = -(x > 0) | | `z = crypto_int64_positive_01(x)` | `z = (x > 0) | | `z = crypto_int64_negative_mask(x)` | `z = -(x < 0) | | `z = crypto_int64_negative_01(x)` | `z = (x < 0) | | `z = crypto_int64_topbit_mask(x)` | `z = -(x < 0) | | `z = crypto_int64_topbit_01(x)` | `z = (x < 0) | | `z = crypto_uint64_topbit_mask(x)` | `z = -(x >> 63) | | `z = crypto_uint64_topbit_01(x)` | `z = (x >> 63) | | `z = crypto_{int,uint}64_nonzero_mask(x)` | `z = -(x != 0) | | `z = crypto_{int,uint}64_nonzero_01(x)` | `z = (x != 0) | | `z = crypto_{int,uint}64_zero_mask(x)` | `z = -(x == 0) | | `z = crypto_{int,uint}64_zero_01(x)` | `z = (x == 0) | | `z = crypto_{int,uint}64_unequal_mask(x,y)` | `z = -(x != y) | | `z = crypto_{int,uint}64_unequal_01(x,y)` | `z = (x != y) | | `z = crypto_{int,uint}64_equal_mask(x,y)` | `z = -(x == y) | | `z = crypto_{int,uint}64_equal_01(x,y)` | `z = (x == y) | | `z = crypto_{int,uint}64_smaller_mask(x,y)` | `z = -(x < y) | | `z = crypto_{int,uint}64_smaller_01(x,y)` | `z = (x < y) | | `z = crypto_{int,uint}64_leq_mask(x,y)` | `z = -(x <= y) | | `z = crypto_{int,uint}64_leq_01(x,y)` | `z = (x <= y) | | `z = crypto_{int,uint}64_min(x,y)` | `z = (x < y) ? x : y` | | `z = crypto_{int,uint}64_max(x,y)` | `z = (x > y) ? x : y` | | `crypto_{int,uint}64_minmax(&x,&y)` | in-place `(x,y) = (min,max)` | | `z = crypto_{int,uint}64_bottombit_mask(x)` | `z = -(x & 1) | | `z = crypto_{int,uint}64_bottombit_01(x)` | `z = (x & 1) | | `z = crypto_{int,uint}64_shlmod(x,j)` | `z = x << (j&63)` | | `z = crypto_{int,uint}64_shrmod(x,j)` | `z = x >> (j&63)` | | `z = crypto_{int,uint}64_bitmod_mask(x,j)` | `z = -((x >> (j&63)) & 1)` | | `z = crypto_{int,uint}64_bitmod_01(x,j)` | `z = ((x >> (j&63)) & 1)` | | `z = crypto_{int,uint}64_ones_num(x)` | `z =` number of bits set in `x` (0 through 64) | | `z = crypto_{int,uint}64_bottomzeros_num(x)` | `z =` number of low-order 0 bits in `x` (0 through 64) | There are also `bitinrangepublicpos` functions tested by `test.c` and used internally. These are not part of the API; use `bitmod` instead. Notes on the split between `mask` functions and `01` functions: * The `01` functions are aligned with C's convention of representing true as `1` and false as `0`. For example, `x < y` in C, like `x < y ? 1 : 0`, means `1` if `x` is smaller than `y`, else `0`. You can replace this with `crypto_int64_smaller_01(x,y)` if `x` and `y` are 64-bit signed integers. * The `mask` functions are aligned with a convention of representing true as `-1` and false as `0`, which works well with logic instructions. For example, you can rewrite the variable-time code `x < y ? u : v` (meaning `u` if `x < y`, else `v`) as `v ^ ((u ^ v) & crypto_int64_smaller_mask(x,y))`. For comparison, `v + (u - v) * crypto_int64_smaller_01(x,y)` would rely on multiplication taking constant time, but on some platforms multiplication takes variable time. Beware that the `mask` convention, like any other use of negative integers, isn't compatible with unsigned integer extension. For example, conversion from `uint8` to `uint64` will convert `-1` to `255` rather than to `-1`. This is an argument for using `int` rather than `uint`. On the other hand, C allows compilers to damage the correctness of `int` code in various ways that aren't allowed for `uint`. Compiling with `-fwrapv`, as recommended above, disables some of that damage. ## Internals cryptoint has two main defenses against timing variations being introduced by compiler "optimizations". The first defense is `optblocker`, which is a global `volatile` variable containing 0. The usage of `optblocker` in cryptoint is designed to systematically hide 1-bit data paths from compilers. The second defense is assembly. This would be safest as separate `.s` files, but the usability constraint of having only two files per size (one `.h` file, one `optblocker.c` file) forces cryptoint to use inline assembly instead. Currently cryptoint has assembly implementations of various functions for * `amd64` (64-bit AMD and Intel, aka `x86_64`), * `arm64` (64-bit ARM, aka `aarch64`), * `arm32` (32-bit ARM, not in Thumb mode), and * `sparc32` (32-bit SPARC, still used in space applications). These are selected automatically for `gcc` and `clang` using tests for `__GNUC__`, `__x86_64__`, etc. Other platforms fall back automatically to portable code using `optblocker`. From an auditing perspective, reviewing cryptoint means checking code for many different functions, and the usage of assembly makes this work more difficult: * Assembly implementations are separate for each size, and separate for each targeted platform: e.g., `crypto_int*_negative_mask` has not just portable code but also 16 assembly implementations (4 sizes for each of `amd64`, `arm64`, `arm32`, `sparc32`). * Assembly is generally less readable than C, making bugs more likely to escape the notice of authors and reviewers. Assembly also depends on quirks of the targeted instruction sets. * Inline assembly has its own quirks, with a more complicated interface than the function ABI. cryptoint takes the following steps to reduce the risk of bugs: * cryptoint includes a new `readasm` tool that generates inline assembly from an easier-to-read format. (See `functions` for the source code in this format; `crypto*.h` are automatically generated.) This improves auditability. Also, the converter generates register annotations, avoiding some common classes of inline-assembly bugs. * All cryptoint functions are subjected to a battery of conventional unit tests via `cryptoint/test.c` in SUPERCOP. Various functions are also indirectly tested via tests of implementations that use cryptoint. * All cryptoint functions are also integrated into saferewrite, which, after compilation, uses symbolic execution and Z3 to check equivalence to reference implementations. This has been run with various compilers for `amd64`, `arm64`, `arm32`, `mips64`, `sparc32`, and `x86`; see `saferewrite-results`. ## Implementation notes The portable version of `SIGNED_negative_mask` could instead use `X >>= (N-1) ^ SIGNED_optblocker`. The use of `optblocker` inside `bitinrangepublicpos` is meant to protect against the compiler causing trouble if `S` is a compile-time constant `63` (or in general 1 below the width), although this is unnecessary for the current applications of `bitinrangepublicpos`. The assembly implementations of `shlmod` and `shrmod` assume that shift instructions take constant time. The portable implementation does not make this assumption. This is important: for example, compilers for 32-bit platforms will typically produce `int64` shift code that takes different time for different shift distances. Various assembly implementations assume that conditional instructions, such as conditional moves, take constant time. The `amd64` implementation of `TYPE_nonzero_01` could use `set` instead of `cmov`. There are more ways to implement `TYPE_ones_num`. The ending could use `*0x...10101`, but this would rely on multiplication instructions taking constant time, which, as noted above, isn't true on some platforms. For `__SSE4_2__` one could use `popcnt`. For `arm64` one could use `cnt` for `cssc`, or NEON `cnt`. For `TYPE_bottomzeros_num`, one could use `tzcnt` for `amd64` with `bmi1`, or `ctz` for `arm64` with `cssc`.