This crate is a Rust port of Daniel Lemire's streamvbyte library.
It contains multiple implementations of this format aimed at different integer sizes and value distributions.
Each Coder implementation produces different formats that are incompatible with one another. Names
provide the length of each of the 4 possible tags for each value so Coder1234 encodes each entry
as 1, 2, 3, or 4 bytes. A scalar implementation is always available at a large speed penalty but
the implementation will automatically use an accelerated implementation for the target if available.
A scalar implementation is available for all Coders but this is typically pretty slow. All
implementations also include acceleration when building for little-endian aarch64 with NEON
instruction support and x86_64 using the SSSE3 or SSE4.1 instruction sets.
Coder1234 will typically be fastest, but other tag length distributions (including Coder1248 for
64-bit values) are available.)
Benchmark numbers were collected on M1 Macbook Air, to generate your own run
cargo bench --bench=streamvbyte
Each benchmark is run with 1024 elements of up to a certain length using a zipf-like distribution.
encode and decode benchmarks measure value coding throughput, deltas variants measure the
same assuming monotonically increasing inputs and compressing deltas between values. data_len
measures throughput for determining data length based on tag values, skip_deltas does the same
for delta coded streams and includes the sum of all values skipped.
| Benchmark | Throughput |
|---|---|
encode/1 |
4.8Gelem/s |
encode_deltas/1 |
4.1Gelem/s |
decode/1 |
12.7Gelem/s |
decode_deltas/1 |
5.6Gelem/s |
skip_deltas/1 |
19.9Gelem/s |
data_len/1 |
54.2Gelem/s |
encode/2 |
4.8Gelem/s |
encode_deltas/2 |
3.8Gelem/s |
decode/2 |
8.2Gelem/s |
decode_deltas/2 |
4.2Gelem/s |
skip_deltas/2 |
8.5Gelem/s |
encode/4 |
4.8Gelem/s |
encode_deltas/4 |
4.1Gelem/s |
decode/4 |
8.2Gelem/s |
decode_deltas/4 |
4.2Gelem/s |
skip_deltas/4 |
8.4Gelem/s |
| Benchmark | Throughput |
|---|---|
encode/1 |
4.2Gelem/s |
encode_deltas/1 |
3.6Gelem/s |
decode/1 |
7.2Gelem/s |
decode_deltas/1 |
4.2Gelem/s |
skip_deltas/1 |
7.1Gelem/s |
data_len/1 |
53.8Gelem/s |
encode/2 |
4.2Gelem/s |
encode_deltas/2 |
3.6Gelem/s |
decode/2 |
7.2Gelem/s |
decode_deltas/2 |
4.2Gelem/s |
skip_deltas/2 |
7.3Gelem/s |
encode/4 |
4.2Gelem/s |
encode_deltas/4 |
3.6Gelem/s |
decode/4 |
7.2Gelem/s |
decode_deltas/4 |
4.2Gelem/s |
skip_deltas/4 |
7.5Gelem/s |
| Benchmark | Throughput |
|---|---|
encode/1 |
3.1Gelem/s |
encode_deltas/1 |
2.5Gelem/s |
decode/1 |
4.6Gelem/s |
decode_deltas/1 |
3.6Gelem/s |
skip_deltas/1 |
6.0Gelem/s |
data_len/1 |
53.8Gelem/s |
encode/4 |
3.1Gelem/s |
encode_deltas/4 |
2.5Gelem/s |
decode/4 |
4.8Gelem/s |
decode_deltas/4 |
3.6Gelem/s |
skip_deltas/4 |
5.7Gelem/s |
encode/8 |
3.0Gelem/s |
encode_deltas/8 |
2.5Gelem/s |
decode/8 |
4.8Gelem/s |
decode_deltas/8 |
3.0Gelem/s |
skip_deltas/8 |
5.4Gelem/s |