-rw-r--r-- 10462 lib1305-20250415/cryptoint/README raw
cryptoint 20250414
D. J. Bernstein
## Introduction
cryptoint is an almost-header-only library providing functions for
comparisons, bit extractions, etc. on `{int,uint}{8,16,32,64}`, while
trying to protect against compilers introducing timing variations.
Advantages over previous library functions for basic constant-time
operations:
* cryptoint provides more functions.
* cryptoint provides better protection against current compiler
"optimizations". (This does not mean the protection is a guarantee.
Make sure to apply further tests such as TIMECOP to the compiled code.)
* All cryptoint functions, after compilation with some common compilers
for some common architectures, have been verified using
[saferewrite](https://pqsrc.cr.yp.to/downloads.html)
to match reference implementations for all inputs.
Often applications have their own ad-hoc constant-time code. Advantages
of rewriting those to use the centralized cryptoint functions:
* More likely to generate code that's
[actually constant-time](https://blog.cr.yp.to/20240803-clang.html).
* Less code to fix in response to whatever further damage is caused by
compiler "optimizations".
* Often less CPU time, although it's rare for this to matter.
* Better testing.
cryptoint is used inside
[SUPERCOP](https://bench.cr.yp.to/supercop.html),
[lib25519](https://lib25519.cr.yp.to),
[libmceliece](https://lib.mceliece.org),
[libntruprime](https://libntruprime.cr.yp.to),
and
[OpenSSH](https://github.com/openssh/openssh-portable/blob/master/sntrup761.c).
## Usage
To use (e.g.) `crypto_int64` in your own package, simply copy
`crypto_int64.h` and `int64_optblocker.c` into that package. Compilation
recommendations:
* Use `gcc` or `clang`. (Porting to other compilers should be a simple
matter of compiling with `-D__attribute__(x)=`; however, tests have
been carried out only with `gcc` and `clang`.)
* Compile all code with `-fwrapv`. (This disables some compiler
"optimizations" that often trigger bugs in integer arithmetic. These
"optimizations" have very little effect on performance.)
* Compile `*optblocker.c` separately: don't manually merge
`*optblocker.c` into other files; don't use the `-flto` option in
compiling `*optblocker.c`.
`crypto_{int,uint}64.h` define types `crypto_{int,uint}64` and the
following API functions:
| usage | meaning |
| ----- | ----- |
| `z = crypto_{int,uint}64_load(ptr)` | little-endian load |
| `crypto_{int,uint}64_store(ptr,z)` | little-endian store |
| `z = crypto_{int,uint}64_load_bigendian(ptr)` | big-endian load |
| `crypto_{int,uint}64_store_bigendian(ptr,z)` | big-endian store |
| `z = crypto_int64_positive_mask(x)` | `z = -(x > 0) |
| `z = crypto_int64_positive_01(x)` | `z = (x > 0) |
| `z = crypto_int64_negative_mask(x)` | `z = -(x < 0) |
| `z = crypto_int64_negative_01(x)` | `z = (x < 0) |
| `z = crypto_int64_topbit_mask(x)` | `z = -(x < 0) |
| `z = crypto_int64_topbit_01(x)` | `z = (x < 0) |
| `z = crypto_uint64_topbit_mask(x)` | `z = -(x >> 63) |
| `z = crypto_uint64_topbit_01(x)` | `z = (x >> 63) |
| `z = crypto_{int,uint}64_nonzero_mask(x)` | `z = -(x != 0) |
| `z = crypto_{int,uint}64_nonzero_01(x)` | `z = (x != 0) |
| `z = crypto_{int,uint}64_zero_mask(x)` | `z = -(x == 0) |
| `z = crypto_{int,uint}64_zero_01(x)` | `z = (x == 0) |
| `z = crypto_{int,uint}64_unequal_mask(x,y)` | `z = -(x != y) |
| `z = crypto_{int,uint}64_unequal_01(x,y)` | `z = (x != y) |
| `z = crypto_{int,uint}64_equal_mask(x,y)` | `z = -(x == y) |
| `z = crypto_{int,uint}64_equal_01(x,y)` | `z = (x == y) |
| `z = crypto_{int,uint}64_smaller_mask(x,y)` | `z = -(x < y) |
| `z = crypto_{int,uint}64_smaller_01(x,y)` | `z = (x < y) |
| `z = crypto_{int,uint}64_leq_mask(x,y)` | `z = -(x <= y) |
| `z = crypto_{int,uint}64_leq_01(x,y)` | `z = (x <= y) |
| `z = crypto_{int,uint}64_min(x,y)` | `z = (x < y) ? x : y` |
| `z = crypto_{int,uint}64_max(x,y)` | `z = (x > y) ? x : y` |
| `crypto_{int,uint}64_minmax(&x,&y)` | in-place `(x,y) = (min,max)` |
| `z = crypto_{int,uint}64_bottombit_mask(x)` | `z = -(x & 1) |
| `z = crypto_{int,uint}64_bottombit_01(x)` | `z = (x & 1) |
| `z = crypto_{int,uint}64_shlmod(x,j)` | `z = x << (j&63)` |
| `z = crypto_{int,uint}64_shrmod(x,j)` | `z = x >> (j&63)` |
| `z = crypto_{int,uint}64_bitmod_mask(x,j)` | `z = -((x >> (j&63)) & 1)` |
| `z = crypto_{int,uint}64_bitmod_01(x,j)` | `z = ((x >> (j&63)) & 1)` |
| `z = crypto_{int,uint}64_ones_num(x)` | `z =` number of bits set in `x` (0 through 64) |
| `z = crypto_{int,uint}64_bottomzeros_num(x)` | `z =` number of low-order 0 bits in `x` (0 through 64) |
There are also `bitinrangepublicpos` functions tested by `test.c` and
used internally. These are not part of the API; use `bitmod` instead.
Notes on the split between `mask` functions and `01` functions:
* The `01` functions are aligned with C's convention of representing
true as `1` and false as `0`. For example, `x < y` in C, like
`x < y ? 1 : 0`, means `1` if `x` is smaller than `y`, else `0`. You
can replace this with `crypto_int64_smaller_01(x,y)` if `x` and `y`
are 64-bit signed integers.
* The `mask` functions are aligned with a convention of representing
true as `-1` and false as `0`, which works well with logic
instructions. For example, you can rewrite the variable-time code
`x < y ? u : v` (meaning `u` if `x < y`, else `v`) as
`v ^ ((u ^ v) & crypto_int64_smaller_mask(x,y))`. For comparison,
`v + (u - v) * crypto_int64_smaller_01(x,y)` would rely on
multiplication taking constant time, but on some platforms
multiplication takes variable time.
Beware that the `mask` convention, like any other use of negative
integers, isn't compatible with unsigned integer extension. For example,
conversion from `uint8` to `uint64` will convert `-1` to `255` rather
than to `-1`. This is an argument for using `int` rather than `uint`. On
the other hand, C allows compilers to damage the correctness of `int`
code in various ways that aren't allowed for `uint`. Compiling with
`-fwrapv`, as recommended above, disables some of that damage.
## Internals
cryptoint has two main defenses against timing variations being
introduced by compiler "optimizations".
The first defense is `optblocker`, which is a global `volatile` variable
containing 0. The usage of `optblocker` in cryptoint is designed to
systematically hide 1-bit data paths from compilers.
The second defense is assembly. This would be safest as separate `.s`
files, but the usability constraint of having only two files per size
(one `.h` file, one `optblocker.c` file) forces cryptoint to use inline
assembly instead. Currently cryptoint has assembly implementations of
various functions for
* `amd64` (64-bit AMD and Intel, aka `x86_64`),
* `arm64` (64-bit ARM, aka `aarch64`),
* `arm32` (32-bit ARM, not in Thumb mode), and
* `sparc32` (32-bit SPARC, still used in space applications).
These are selected automatically for `gcc` and `clang` using tests for
`__GNUC__`, `__x86_64__`, etc. Other platforms fall back automatically
to portable code using `optblocker`.
From an auditing perspective, reviewing cryptoint means checking code
for many different functions, and the usage of assembly makes this work
more difficult:
* Assembly implementations are separate for each size, and separate
for each targeted platform: e.g., `crypto_int*_negative_mask` has
not just portable code but also 16 assembly implementations (4
sizes for each of `amd64`, `arm64`, `arm32`, `sparc32`).
* Assembly is generally less readable than C, making bugs more
likely to escape the notice of authors and reviewers. Assembly also
depends on quirks of the targeted instruction sets.
* Inline assembly has its own quirks, with a more complicated
interface than the function ABI.
cryptoint takes the following steps to reduce the risk of bugs:
* cryptoint includes a new `readasm` tool that generates inline assembly
from an easier-to-read format. (See `functions` for the source code in
this format; `crypto*.h` are automatically generated.) This improves
auditability. Also, the converter generates register annotations,
avoiding some common classes of inline-assembly bugs.
* All cryptoint functions are subjected to a battery of conventional
unit tests via `cryptoint/test.c` in SUPERCOP. Various functions are
also indirectly tested via tests of implementations that use cryptoint.
* All cryptoint functions are also integrated into saferewrite, which,
after compilation, uses symbolic execution and Z3 to check equivalence
to reference implementations. This has been run with various compilers
for `amd64`, `arm64`, `arm32`, `mips64`, `sparc32`, and `x86`; see
`saferewrite-results`.
## Implementation notes
The portable version of `SIGNED_negative_mask` could instead use
`X >>= (N-1) ^ SIGNED_optblocker`.
The use of `optblocker` inside `bitinrangepublicpos` is meant to protect
against the compiler causing trouble if `S` is a compile-time constant
`63` (or in general 1 below the width), although this is unnecessary for
the current applications of `bitinrangepublicpos`.
The assembly implementations of `shlmod` and `shrmod` assume that shift
instructions take constant time. The portable implementation does not
make this assumption. This is important: for example, compilers for
32-bit platforms will typically produce `int64` shift code that takes
different time for different shift distances.
Various assembly implementations assume that conditional instructions,
such as conditional moves, take constant time.
The `amd64` implementation of `TYPE_nonzero_01` could use `set` instead
of `cmov`.
There are more ways to implement `TYPE_ones_num`. The ending could use
`*0x...10101`, but this would rely on multiplication instructions taking
constant time, which, as noted above, isn't true on some platforms. For
`__SSE4_2__` one could use `popcnt`. For `arm64` one could use `cnt` for
`cssc`, or NEON `cnt`.
For `TYPE_bottomzeros_num`, one could use `tzcnt` for `amd64` with
`bmi1`, or `ctz` for `arm64` with `cssc`.