Skip to content

feat(zkvm): self-describing framed proof serialization#1267

Open
quangvdao wants to merge 18 commits intoa16z:mainfrom
quangvdao:quang/robust-proof-format
Open

feat(zkvm): self-describing framed proof serialization#1267
quangvdao wants to merge 18 commits intoa16z:mainfrom
quangvdao:quang/robust-proof-format

Conversation

@quangvdao
Copy link
Copy Markdown
Contributor

@quangvdao quangvdao commented Feb 16, 2026

Summary

Replaces derive-macro-based CanonicalSerialize/CanonicalDeserialize for JoltProof with a manual, length-delimited section format intended for strictness and DoS resistance.

Wire format

`[magic: 4B "JOLT"][version: 1B][flags: 1B][section₀][section₁]…`

Each section is `[varint payload_len][payload bytes]`. Sections are sequential and untagged — the deserializer reads them in the fixed order defined by the proof schema.

Header (6 bytes):

  • Magic: b"JOLT" (4 bytes)
  • Version: 1 (1 byte)
  • Flags: bit 0 = is_zk (ZK mode vs standard), bits 1–7 reserved (must be 0)

Changes

  • zkvm::transport — new module with varint u64 read/write, magic+version header helpers, and capped section-length reading
  • JoltProof ser/de — manual, length-delimited section format with:
    • Per-section caps (1 KiB for params, 128 KiB for other sections)
    • Intra-section trailing-byte checks
    • debug_assert! on write to catch cap mismatches during development
    • Final EOF check to reject trailing garbage
    • ZK/non-ZK mode flag with clear cross-mode error messages
  • OpeningId encoding — packed header byte (2-bit kind + 6-bit sumcheck_id) with varint escape for sumcheck_id ≥ 63
  • Claims unification — standalone Claims ser/de now uses varint count (matching JoltProof's inline format); JoltProof delegates to it via write_section!/read_single_section!
  • Varint hardening — overflow check on 10th byte (shift=63)

Security / DoS properties

  • Section lengths capped before any allocation
  • Commitment and claims counts capped at 10,000
  • Varint decoding rejects overflows
  • Reserved flag bits rejected (forward compatibility)
  • Cross-mode deserialization gives clear error messages
  • EOF check rejects trailing bytes after final section

Test plan

  • cargo clippy clean in both --features host and --features host,zk
  • All proof_serialization + transport unit tests pass in both modes
  • OpeningId roundtrip tests cover all parameterized variants (CommittedPolynomial, VirtualPolynomial) × multiple SumcheckId values
  • muldiv e2e test passes in both modes

u64::from_le_bytes(pre_value_bytes)
};

if effective_address <= RAM_START_ADDRESS - 8 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove these changes since these examples should be working now without crashing. A write outside our defined memory regions is almost always just a write to a null ptr, which means something is actually wrong.

quangvdao and others added 3 commits February 24, 2026 08:51
Introduce a self-describing framed encoding for proofs and the minimal transport helpers.
This removes dependence on brittle enum counts and enables strict parsing with length caps.

Co-authored-by: Cursor <cursoragent@cursor.com>
Hoist File import to the top-level import block.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add write_framed_section!/framed_section_size!/read_singleton! macros
to eliminate repetitive stage serialization in JoltProof. Collapse 6
per-tag size caps into a single MAX_SECTION_LEN, inline thin transport
wrapper functions, compact VirtualPolynomial serialized_size to a
wildcard match, and move bundle framing constants to transport.rs.

Co-authored-by: Cursor <cursoragent@cursor.com>
@quangvdao quangvdao force-pushed the quang/robust-proof-format branch from ba45a3c to 05243f9 Compare February 24, 2026 16:52
@quangvdao quangvdao changed the title feat(zkvm): robust proof serialization + recursion guest verifier fixes feat(zkvm): self-describing framed proof serialization Feb 24, 2026
…ntext

- Embed format version (v1) in the proof signature so future format
  changes produce a clear mismatch instead of opaque InvalidData.
- Restore per-tag section caps (params: 16 KiB, commitments/claims:
  256 MiB, stages: 512 MiB) instead of a single 512 MiB cap.
- Add descriptive error messages to all deserialization failure paths:
  duplicate sections, unknown tags, trailing bytes, missing fields.
- Remove unused bundle framing constants from transport.rs (will be
  re-added when the recursion example PR lands).

Co-authored-by: Cursor <cursoragent@cursor.com>
@quangvdao quangvdao force-pushed the quang/robust-proof-format branch from 3110af8 to f3112b3 Compare February 24, 2026 17:23
quangvdao and others added 7 commits March 3, 2026 12:23
Resolve conflict in proof_serialization.rs: update framed format
to support C: JoltCurve type param, cfg-gated blindfold_proof/
opening_claims, UniSkipFirstRoundProofVariant, and remove bytecode_K.

Made-with: Cursor
…caps, hardened deserialization

Replace TLV tag-dispatch with sequential length-prefixed format:
no tags, no Option temporaries, no require! macro. Extract shared
payload-length helpers to eliminate serialize/serialized_size duplication.
Restore exhaustive VirtualPolynomial::serialized_size() match.

Harden deserialization: reject trace_length=0, reject duplicate
opening-claim keys, tighten all section caps to 128 KiB (params 1 KiB),
entry count cap 10K. Separate magic from version byte for specific
"unsupported version" errors.

Simplify transport.rs: remove dead code (skip_exact, read_u8_opt,
frame headers), add read_section helper, add unit tests. Remove
leaked SDK re-export of transport module.

Made-with: Cursor
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors zkVM proof serialization by introducing a dedicated transport layer (magic/version + varint section lengths) and moving JoltProof/OpeningId serialization from derive-macro-based encoding to explicit, framed encoding intended to be stricter and more DoS-resistant.

Changes:

  • Added zkvm::transport with varint u64 read/write, magic+version header helpers, and section-length capping.
  • Rewrote JoltProof canonical ser/de to a manual, length-delimited section format with per-section caps and intra-section trailing-byte checks.
  • Reworked OpeningId encoding into a packed header byte with an escape hatch for larger sumcheck IDs; added unit tests.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
jolt-core/src/zkvm/transport.rs New transport helpers for magic/version and varint-length-delimited sections (plus tests).
jolt-core/src/zkvm/proof_serialization.rs Manual proof framing + stricter parsing checks; new packed OpeningId encoding; added tests.
jolt-core/src/zkvm/mod.rs Exposes the new transport module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +51 to +63
pub fn read_varint_u64<R: Read>(r: &mut R) -> io::Result<u64> {
let mut x = 0u64;
let mut shift = 0u32;
for _ in 0..VARINT_U64_MAX_BYTES {
let mut b = [0u8; 1];
r.read_exact(&mut b)?;
let byte = b[0];
x |= ((byte & 0x7F) as u64) << shift;
if (byte & 0x80) == 0 {
return Ok(x);
}
shift += 7;
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_varint_u64 only rejects inputs longer than 10 bytes, but it does not detect numeric overflow within 10 bytes (e.g., a 10th byte with bits beyond the single allowed MSB can wrap when shifted by 63). Add an explicit overflow check for the final byte/shift so invalid encodings that exceed u64::MAX are rejected instead of silently wrapping.

Copilot uses AI. Check for mistakes.
Comment on lines +146 to +171
transport::write_magic_version(&mut writer, PROOF_MAGIC, PROOF_VERSION).map_err(io_err)?;

let params_len = self.params_payload_len(compress);
transport::write_varint_u64(&mut writer, params_len).map_err(io_err)?;
transport::write_varint_u64(&mut writer, self.trace_length as u64).map_err(io_err)?;
transport::write_varint_u64(&mut writer, self.ram_K as u64).map_err(io_err)?;
self.rw_config.serialize_with_mode(&mut writer, compress)?;
self.one_hot_config
.serialize_with_mode(&mut writer, compress)?;
self.dory_layout
.serialize_with_mode(&mut writer, compress)?;

let commitments_len = self.commitments_payload_len(compress);
transport::write_varint_u64(&mut writer, commitments_len).map_err(io_err)?;
transport::write_varint_u64(&mut writer, self.commitments.len() as u64).map_err(io_err)?;
for c in &self.commitments {
c.serialize_with_mode(&mut writer, compress)?;
}
match &self.untrusted_advice_commitment {
None => writer.write_all(&[0]).map_err(io_err)?,
Some(c) => {
writer.write_all(&[1]).map_err(io_err)?;
c.serialize_with_mode(&mut writer, compress)?;
}
}

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description and wire-format spec describe a TLV framing ([tag: u8][len: varint][payload]) with unknown-tag/duplicate detection, but the implementation here writes a fixed sequence of varint-length-prefixed sections without any per-section tag. Either update the implementation to include tags and parse frames by tag (enabling strict unknown/duplicate handling), or update the PR description/title/spec to match the actual on-wire format.

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +42
const MAX_PARAMS_LEN: u64 = 1024;
const MAX_SECTION_LEN: u64 = 128 * 1024;

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_SECTION_LEN is set to 128 * 1024 (128 KiB), which doesn’t match the PR description’s stated 512 MiB per-section cap and may be too small for real proofs as trace sizes grow. Please reconcile the cap with the intended wire-format spec (and ideally make caps per-section / documented) so valid proofs aren’t rejected during deserialization.

Copilot uses AI. Check for mistakes.
Comment on lines +381 to +404
Ok(Self {
commitments,
stage1_uni_skip_first_round_proof,
stage1_sumcheck_proof,
stage2_uni_skip_first_round_proof,
stage2_sumcheck_proof,
stage3_sumcheck_proof,
stage4_sumcheck_proof,
stage5_sumcheck_proof,
stage6_sumcheck_proof,
stage7_sumcheck_proof,
#[cfg(feature = "zk")]
blindfold_proof,
joint_opening_proof,
untrusted_advice_commitment,
#[cfg(not(feature = "zk"))]
opening_claims,
trace_length,
ram_K,
rw_config,
one_hot_config,
dory_layout,
})
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deserialize_with_mode enforces trailing-byte consumption within each length-delimited section, but it doesn’t validate that the overall proof input ends exactly after the final section (extra bytes after the last section would be ignored). If the goal is a strict, malformed-input-resistant wire format, add a final EOF check after parsing all expected sections.

Copilot uses AI. Check for mistakes.
quangvdao and others added 3 commits March 7, 2026 03:34
…fication, ZK discriminator

- Propagate I/O errors in EOF check instead of silently swallowing them
- Unify standalone Claims ser/de with JoltProof's varint-based format;
  delegate via write_section!/read_single_section! instead of duplicating
- Add debug_assert! in write_section! to catch ser/de cap mismatches
- Add flags byte to wire header (bit 0 = is_zk) with clear cross-mode
  error messages; shorten magic from b"JOLTPRF" to b"JOLT"
- Move `use std::fs::File` back inside function scope
- Expand OpeningId roundtrip tests to cover all parameterized variants
  (CommittedPolynomial, VirtualPolynomial) x multiple SumcheckId values

Made-with: Cursor
@moodlezoup
Copy link
Copy Markdown
Collaborator

let's see if this works: /claude-review

@moodlezoup
Copy link
Copy Markdown
Collaborator

@claude review this PR

@github-actions
Copy link
Copy Markdown
Contributor

Claude Code is working…

I'll analyze this and get back to you.

View job run

@sagar-a16z sagar-a16z requested a review from 0xAndoroid March 26, 2026 18:42
@sagar-a16z
Copy link
Copy Markdown
Contributor

@claude review please. you seem to be stuck.

@github-actions
Copy link
Copy Markdown
Contributor

Claude Code is working…

I'll analyze this and get back to you.

View job run

macro_rules! write_section {
($w:expr, $c:expr, $($item:expr),+ $(,)?) => {{
let len: u64 = 0 $(+ $item.serialized_size($c) as u64)+;
debug_assert!(len <= MAX_SECTION_LEN, "section size {len} exceeds MAX_SECTION_LEN");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A release-mode prover can produce proofs it can't read back. Should be assert! or return Err.

let n = transport::read_varint_u64(&mut limited).map_err(io_err)?;
let n_usize = usize::try_from(n).map_err(|_| SerializationError::InvalidData)?;
if n_usize > 10_000 {
return Err(SerializationError::InvalidData);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a named constant like MAX_CLAIMS_COUNT. Also 10,000 × 384 bytes/GT = ~3.8 MB which exceeds the 128 KiB section cap enforced by read_section above. The section cap is the real constraint, no? (~341 commitments max).

let small = header & 0x3F;

let sumcheck_u64 = if small == OPENING_ID_SUMCHECK_ESCAPE {
transport::read_varint_u64(&mut reader).map_err(io_err)?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepts non-canonical encodings. i.e sumcheck_id=0 can be encoded inline (1 byte) or via escape+varint (2+ bytes). Add after the varint read:

  if sumcheck_u64 < OPENING_ID_SUMCHECK_ESCAPE as u64 {                                                                                                                                                                
      return Err(SerializationError::InvalidData);
  }     

pub struct Claims<F: JoltField>(pub Openings<F>);

#[cfg(not(feature = "zk"))]
const MAX_CLAIMS_COUNT: u64 = 10_000;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10,000 × ~33 bytes/claim = ~330 KB which exceeds MAX_SECTION_LEN (128 KiB). The section cap is the binding constraint (~3,700 claims max). These should be harmonized.

pub mod ram;
pub mod registers;
pub mod spartan;
pub mod transport;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be pub(crate)? it's internal?


#[inline]
pub fn read_magic_version<R: Read>(r: &mut R, magic: &[u8]) -> io::Result<u8> {
let mut buf = vec![0u8; magic.len()];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heap-allocates for a 4-byte read. Use let mut buf = [0u8; 4] instead.

}

#[cfg(test)]
mod tests {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have proper e2e tests - maybe also modify examples to use this new formatting.

Side note: the Dory crate does Vec::with_capacity(num_rounds) where num_rounds is an untrusted u32 from the stream, before any data hits the Take reader. An attacker can trigger a ~7.8 TB allocation. The section cap limits I/O but not the upfront allocation. Worth adding a sanity check on the num_rounds in the proof.

@quangvdao quangvdao requested a review from moodlezoup as a code owner March 31, 2026 12:22
@quangvdao quangvdao requested a review from markosg04 as a code owner March 31, 2026 12:22
@quangvdao
Copy link
Copy Markdown
Contributor Author

Posted by Codex assistant (model: GPT-5.4, reasoning effort: xhigh) on behalf of the user (Quang Dao) with approval.

I pushed a pass addressing the active review comments.

Changes:

  • enforce section caps at serialization time, so release builds can’t emit proofs that our deserializer will reject
  • harmonize the count checks with the actual 128 KiB section cap instead of keeping misleading 10,000-element limits
  • reject non-canonical OpeningId escape encodings
  • make zkvm::transport internal and remove the heap allocation for the 4-byte magic read
  • add a sanity check on the Dory opening proof’s declared round count before handing the bytes to dory-pcs, so a malformed proof can’t trigger an enormous preallocation inside the Dory deserializer
  • add real proof roundtrip coverage (prove -> serialize -> deserialize -> verify) in both standard and ZK mode, including muldiv, plus advice-heavy coverage in both default and address-major layouts

Checks run:

  • cargo nextest run -p jolt-core proof_serialization::tests --cargo-quiet --features host
  • cargo nextest run -p jolt-core proof_serialization::tests --cargo-quiet --features host,zk
  • cargo nextest run -p jolt-core muldiv_e2e_dory --cargo-quiet --features host
  • cargo nextest run -p jolt-core muldiv_e2e_dory --cargo-quiet --features host,zk
  • cargo nextest run -p jolt-core advice_e2e_dory advice_e2e_dory_address_major --cargo-quiet --features host
  • cargo nextest run -p jolt-core advice_e2e_dory advice_e2e_dory_address_major --cargo-quiet --features host,zk
  • cargo clippy -p jolt-core --features host --message-format=short -q --all-targets -- -D warnings
  • cargo clippy -p jolt-core --features host,zk --message-format=short -q --all-targets -- -D warnings

I didn’t touch the old tracer/src/emulator/mmu.rs thread since that one is marked outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants