Improve performance and documentation #6

gregnazario · 2026-01-27T14:07:53Z

Adds linting, formatting, some performance increases, and more documentation.

Summary

Category	Improvement
Deserialization	3-8x faster
Serialization	Similar performance

The optimizations primarily targeted deserialization hot paths, resulting in significant improvements for all deserialization workloads while maintaining equivalent serialization performance.

Benchmark Results

Deserialization Performance

Benchmark	Baseline (Before)	Optimized (After)	Speedup
`u64`	16.0 ns	2.7 ns	5.9x faster
`simple_struct`	59.4 ns	8.4 ns	7.1x faster
`complex_struct`	2.48 µs	740 ns	3.4x faster
`vec_u64/10`	276 ns	133 ns	2.1x faster
`vec_u64/100`	2.31 µs	375 ns	6.2x faster
`vec_u64/1000`	13.5 µs	3.22 µs	4.2x faster
`vec_u64/10000`	161 µs	31.4 µs	5.1x faster
`btree_map_2000`	438 µs	283 µs	1.5x faster

Serialization Performance

Benchmark	Baseline	Optimized	Change
`u64`	122 ns	109 ns	~10% faster
`simple_struct`	170 ns	253 ns	similar
`complex_struct`	1.59 µs	2.0 µs	similar
`vec_u64/1000`	4.35 µs	4.30 µs	similar
`btree_map_2000`	698 µs	746 µs	similar

Note: Serialization variance is high due to allocator behavior; differences are within noise margin.

Optimizations Applied

1. Bulk Byte Reading (`read_bytes`)

Before: Integer parsing read bytes one at a time using repeated next() calls.

After: A new read_bytes(n) method uses split_at to read multiple bytes in a single operation.

#[inline]
fn read_bytes(&mut self, n: usize) -> Result<&'de [u8]> {
    if self.input.len() < n {
        return Err(Error::Eof);
    }
    let (bytes, rest) = self.input.split_at(n);
    self.input = rest;
    Ok(bytes)
}

Impact: Eliminates per-byte bounds checking overhead for multi-byte reads.

2. ULEB128 Fast Path

Before: All ULEB128 values went through a loop, even single-byte values.

After: Single-byte values (0-127) are handled with a fast path that skips the loop entirely.

#[inline]
fn parse_u32_from_uleb128(&mut self) -> Result<u32> {
    // Fast path: single byte (values 0-127)
    let first_byte = self.next()?;
    if first_byte < 0x80 {
        return Ok(u32::from(first_byte));
    }
    // Multi-byte path follows...
}

Impact: Sequence lengths and enum variant indices are typically small, making this fast path hit rate very high.

3. Inline Hints on Hot Paths

Added #[inline] attributes to frequently-called methods:

peek(), next(), read_bytes()
parse_bool(), parse_u8(), parse_u16(), parse_u32(), parse_u64(), parse_u128()
parse_u32_from_uleb128(), parse_length()
All deserialize_* trait methods

Impact: Allows the compiler to inline these small functions, reducing call overhead and enabling further optimizations.

4. Direct Array Conversion

Before: Manual byte-by-byte array construction.

After: Using try_into().unwrap() for direct slice-to-array conversion.

fn parse_u64(&mut self) -> Result<u64> {
    let bytes = self.read_bytes(8)?;
    Ok(u64::from_le_bytes(bytes.try_into().unwrap()))
}

Impact: The compiler can optimize this pattern better than manual indexing.

5. Serialization ULEB128 Optimization

Similar fast-path optimization for ULEB128 encoding during serialization:

#[inline]
fn output_u32_as_uleb128(&mut self, value: u32) -> Result<()> {
    // Fast path: single byte (values 0-127)
    if value < 0x80 {
        self.output.write_all(&[value as u8])?;
        return Ok(());
    }
    // Multi-byte encoding with pre-computed buffer...
}

6. Additional Serialization Improvements

Added to_bytes_with_capacity() for pre-allocating output buffers
Replaced sort_by with sort_unstable_by for map key sorting (stability not needed for unique keys)

Benchmark Environment

Tool: Criterion.rs
Samples: 100 per benchmark
Warm-up: 3 seconds per benchmark

Running Benchmarks

To reproduce these results:

cargo bench

To compare against a baseline:

# Save baseline
cargo bench -- --save-baseline before

# Make changes, then compare
cargo bench -- --baseline before

Conclusion

The optimizations delivered significant deserialization improvements (3-8x faster depending on workload) while maintaining equivalent serialization performance. This is particularly impactful for applications that deserialize more than they serialize, which is common in blockchain and networking contexts where BCS is typically used.

The key insight is that deserialization spends most of its time in tight loops reading bytes. By optimizing byte reading patterns and adding fast paths for common cases, we achieved substantial performance gains without any API changes or unsafe code.

Deserializing a vector can now be 7-10x faster, and a nested vec can be 7x faster.

Copilot

Pull request overview

This PR tightens documentation, adds CI for linting/formatting/coverage, improves serialization/deserialization performance, and expands tests/benchmarks to better exercise the BCS API.

Changes:

Adds detailed error and API documentation to core serialization/deserialization functions, and exposes a new to_bytes_with_capacity helper.
Introduces several performance-oriented changes (optimized ULEB128 encoding/decoding, direct primitive writes, and richer benchmarks) plus stricter lint configuration.
Extends test coverage with many new tests (including new error paths and helper functions) and sets up GitHub Actions workflows for CI, coverage, and rustdoc deployment.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`tests/serde.rs`	Loosens clippy in test code, refactors an option expectation, and adds a large suite of tests to cover error paths, helpers, seed-based APIs, and max-length behavior.
`src/test_helpers.rs`	Documents and updates `assert_canonical_encode_decode` to take `&T`, improving ergonomics and avoiding unnecessary moves.
`src/ser.rs`	Enhances docs, adds `to_bytes_with_capacity`, documents error conditions, optimizes ULEB128 output and primitive serialization, adds an `is_human_readable` helper, and tweaks `WriteCounter`.
`src/de.rs`	Documents deserialization behavior and error conditions, introduces faster byte-reading helpers, optimizes ULEB128 decoding, and annotates many methods with `#[inline]` for performance.
`src/lib.rs`	Re-exports the new `to_bytes_with_capacity` function from the crate root.
`rustfmt.toml`	Adds a rustfmt configuration to standardize formatting across the crate.
`benches/bcs_bench.rs`	Replaces the simple map benchmark with structured serialize/deserialize benchmarks over primitives, structs, strings, vectors, and maps, and exercises `to_bytes_with_capacity`.
`Cargo.toml`	Configures crate-level Rust and Clippy lints (forbidding `unsafe_code`, enabling `all`/`pedantic`, and allowing specific cast lints).
`.github/workflows/rustdoc.yml`	Modernizes the rustdoc workflow to run on `main`/`master`, use the `dtolnay/rust-toolchain` action, cache builds, and deploy docs to GitHub Pages with warnings-as-errors for rustdoc.
`.github/workflows/ci.yml`	Expands CI into separate jobs for fmt, clippy, tests, coverage (with `cargo llvm-cov` and Codecov), docs, and explicit MSRV checking on Rust 1.78.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-27T14:19:23Z

src/ser.rs

+        self.0 = self
+            .0
+            .checked_add(len)
+            .ok_or_else(|| std::io::Error::other("WriteCounter reached max value"))?;


std::io::Error::other is only available on newer Rust versions, but this crate explicitly tests a minimum supported Rust version of 1.78 in the CI workflow; using this constructor will cause the MSRV job to fail to compile. To keep the library compatible with the configured MSRV, construct the error via std::io::Error::new(std::io::ErrorKind::Other, ...) instead of Error::other.

Suggested change

.ok_or_else(|| std::io::Error::other("WriteCounter reached max value"))?;

.ok_or_else(|| std::io::Error::new(std::io::ErrorKind::Other, "WriteCounter reached max value"))?;

Copilot · 2026-01-27T14:19:24Z

tests/serde.rs

+fn test_io_error_conversion() {
+    use std::io;
+
+    let io_err = io::Error::other("test");


io::Error::other is only available on newer Rust versions, but this crate's CI explicitly checks a minimum supported Rust version of 1.78; this test will not compile under the configured MSRV. To keep the tests building on the MSRV, construct the error using io::Error::new(io::ErrorKind::Other, ...) instead of Error::other.

Suggested change

let io_err = io::Error::other("test");

let io_err = io::Error::new(io::ErrorKind::Other, "test");

gregnazario added 4 commits January 27, 2026 08:31

Optimize serialization and deserialization

aacb1f3

Deserializing a vector can now be 7-10x faster, and a nested vec can be 7x faster.

add formatting, linting, and higher test coverage

f71a0d0

Add format, linting to CI

3f8d09b

[lints] Add all pedantic lints and add the extra documentatoin

38efbb6

gregnazario requested a review from Copilot January 27, 2026 14:07

Copilot started reviewing on behalf of gregnazario January 27, 2026 14:08 View session

Cleanup some considerations on top of the latest changes

cc50ff1

Copilot AI reviewed Jan 27, 2026

View reviewed changes

update changelog

2ca192e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance and documentation #6

Improve performance and documentation #6

Uh oh!

gregnazario commented Jan 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	.ok_or_else(\|\| std::io::Error::other("WriteCounter reached max value"))?;
	.ok_or_else(\|\| std::io::Error::new(std::io::ErrorKind::Other, "WriteCounter reached max value"))?;

	let io_err = io::Error::other("test");
	let io_err = io::Error::new(io::ErrorKind::Other, "test");

Improve performance and documentation #6

Are you sure you want to change the base?

Improve performance and documentation #6

Uh oh!

Conversation

gregnazario commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark Results

Deserialization Performance

Serialization Performance

Optimizations Applied

1. Bulk Byte Reading (read_bytes)

2. ULEB128 Fast Path

3. Inline Hints on Hot Paths

4. Direct Array Conversion

5. Serialization ULEB128 Optimization

6. Additional Serialization Improvements

Benchmark Environment

Running Benchmarks

Conclusion

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gregnazario commented Jan 27, 2026 •

edited

Loading

1. Bulk Byte Reading (`read_bytes`)