Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a CLVM tree interning API to deduplicate atoms and pairs, expose structured statistics for cost calculation, and provide associated tests and fuzzing to support an upcoming generator-identity hard fork.
Changes:
- Added a new
serde::internmodule withInternedTree,InternedStats, and aninternfunction that builds a deduplicated allocator plus helper APIs (stats, tree hash, indices). - Expanded serde exports and tests to cover the new interning behavior, including hex-based structural tests that assert serialization and tree-hash equivalence, and atom/pair dedup counts.
- Added a dedicated fuzz target for the interning API and wired it into the fuzz crate, checking serialization equality and deduplication invariants under random trees.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/serde/intern.rs |
Implements the core interning algorithm, stats helpers, tree hashing, node index mapping, and unit tests for correctness and deduplication behavior. |
src/serde/mod.rs |
Wires the new intern module into the serde public API and exposes Bytes32, while registering the new test module. |
src/serde/test_intern.rs |
Adds hex-based integration tests that deserialize trees, intern them, and verify serialization equality, tree-hash equality, and expected unique atom/pair counts across various shapes. |
fuzz/fuzz_targets/intern.rs |
Introduces a fuzz target for interning that generates random trees, asserts serialization invariants and deduplication properties, and exercises the tree-hash path. |
fuzz/Cargo.toml |
Registers the new intern fuzz target as a binary for inclusion in the fuzzing suite. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7a82fda to
4241748
Compare
|
@richardkiss I've opened a new pull request, #685, to work on those changes. Once the pull request is ready, I'll request review from you. |
Pull Request Test Coverage Report for Build 22120814463Details
💛 - Coveralls |
arvidn
left a comment
There was a problem hiding this comment.
I'm worried about the following DoS vectors:
- a tree with a very large number of small (and different) atoms, making book keeping costly
- a tree with many atoms of moderate size (say 100 bytes or so) that makes them costly to hash, in order to look up in the hash map.
- a tree that's large, much larger than we allow, but still causes atoms to be duplicated, using 2x the RAM
- a large tree with small (different) atoms, with no deduplication opportunities. Computing the tree hash would cause an (almost) 32x memory usage, assuming I understand correctly that every node's tree hash is cached.
a0ae7ed to
c8328c3
Compare
|
@richardkiss I've opened a new pull request, #692, to work on those changes. Once the pull request is ready, I'll request review from you. |
aabf999 to
fd8698b
Compare
|
sorry, I broke this by changing some names. Now you can also ask for |
arvidn
left a comment
There was a problem hiding this comment.
would you mind adding a benchmark for intern() as well?
we have a few generators that we benchmark treehash on, you could use those.
I don't see any major problems with this
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
cc35f6b to
55f1753
Compare
55f1753 to
0f29983
Compare
0f29983 to
3251a4e
Compare
3251a4e to
3a520de
Compare
This PR adds API for interning that will be necessary for the hard fork that changes the generator identity and cost to use the contents of the generator rather than the serialization of it.
More information about the ideas behind this hard fork can be read here: https://github.com/richardkiss/generator-identity-hf-analysis/
This is the first PR of several. The next PR is in
chia_rswhich will depend upon a release ofclvm_rshaving this new API. See https://github.com/richardkiss/generator-identity-hf-analysis/#installation for more explanation of the various PRs.Note
Medium Risk
Although mostly additive, this introduces a new public API that manipulates allocator/node identity and is intended for consensus-adjacent cost/identity work, so subtle correctness/performance issues could have downstream impact.
Overview
Adds a new
serde::internAPI (intern+InternedTree) that rebuilds a CLVM tree into a fresh allocator while deduplicating identical atoms (by bytes) and pairs (by interned child tuple), and exposes the interned root plus ordered lists of unique atoms/pairs.Introduces coverage for this behavior via unit tests (including hex fixtures and ordering expectations), a new libFuzzer target that checks serialization/tree-hash invariants and that interning doesn’t increase unique/allocated node counts, and a Criterion benchmark (
benches/intern.rs) wired intoCargo.toml.Written by Cursor Bugbot for commit 3a520de. This will update automatically on new commits. Configure here.