Skip to content

[CHIA-3823] Add interning API.#684

Merged
richardkiss merged 1 commit intomainfrom
generator-identity-hf
Feb 19, 2026
Merged

[CHIA-3823] Add interning API.#684
richardkiss merged 1 commit intomainfrom
generator-identity-hf

Conversation

@richardkiss
Copy link
Contributor

@richardkiss richardkiss commented Jan 26, 2026

This PR adds API for interning that will be necessary for the hard fork that changes the generator identity and cost to use the contents of the generator rather than the serialization of it.

More information about the ideas behind this hard fork can be read here: https://github.com/richardkiss/generator-identity-hf-analysis/

This is the first PR of several. The next PR is in chia_rs which will depend upon a release of clvm_rs having this new API. See https://github.com/richardkiss/generator-identity-hf-analysis/#installation for more explanation of the various PRs.


Note

Medium Risk
Although mostly additive, this introduces a new public API that manipulates allocator/node identity and is intended for consensus-adjacent cost/identity work, so subtle correctness/performance issues could have downstream impact.

Overview
Adds a new serde::intern API (intern + InternedTree) that rebuilds a CLVM tree into a fresh allocator while deduplicating identical atoms (by bytes) and pairs (by interned child tuple), and exposes the interned root plus ordered lists of unique atoms/pairs.

Introduces coverage for this behavior via unit tests (including hex fixtures and ordering expectations), a new libFuzzer target that checks serialization/tree-hash invariants and that interning doesn’t increase unique/allocated node counts, and a Criterion benchmark (benches/intern.rs) wired into Cargo.toml.

Written by Cursor Bugbot for commit 3a520de. This will update automatically on new commits. Configure here.

Copilot AI review requested due to automatic review settings January 26, 2026 21:03
@richardkiss richardkiss requested a review from arvidn January 26, 2026 21:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a CLVM tree interning API to deduplicate atoms and pairs, expose structured statistics for cost calculation, and provide associated tests and fuzzing to support an upcoming generator-identity hard fork.

Changes:

  • Added a new serde::intern module with InternedTree, InternedStats, and an intern function that builds a deduplicated allocator plus helper APIs (stats, tree hash, indices).
  • Expanded serde exports and tests to cover the new interning behavior, including hex-based structural tests that assert serialization and tree-hash equivalence, and atom/pair dedup counts.
  • Added a dedicated fuzz target for the interning API and wired it into the fuzz crate, checking serialization equality and deduplication invariants under random trees.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/serde/intern.rs Implements the core interning algorithm, stats helpers, tree hashing, node index mapping, and unit tests for correctness and deduplication behavior.
src/serde/mod.rs Wires the new intern module into the serde public API and exposes Bytes32, while registering the new test module.
src/serde/test_intern.rs Adds hex-based integration tests that deserialize trees, intern them, and verify serialization equality, tree-hash equality, and expected unique atom/pair counts across various shapes.
fuzz/fuzz_targets/intern.rs Introduces a fuzz target for interning that generates random trees, asserts serialization invariants and deduplication properties, and exercises the tree-hash path.
fuzz/Cargo.toml Registers the new intern fuzz target as a binary for inclusion in the fuzzing suite.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI commented Jan 26, 2026

@richardkiss I've opened a new pull request, #685, to work on those changes. Once the pull request is ready, I'll request review from you.

@coveralls-official
Copy link

coveralls-official bot commented Jan 26, 2026

Pull Request Test Coverage Report for Build 22120814463

Details

  • 149 of 158 (94.3%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.2%) to 90.66%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/serde/intern.rs 123 126 97.62%
src/serde/test_intern.rs 26 32 81.25%
Totals Coverage Status
Change from base Build 22099149151: 0.2%
Covered Lines: 6824
Relevant Lines: 7527

💛 - Coveralls

Copy link
Contributor

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the following DoS vectors:

  • a tree with a very large number of small (and different) atoms, making book keeping costly
  • a tree with many atoms of moderate size (say 100 bytes or so) that makes them costly to hash, in order to look up in the hash map.
  • a tree that's large, much larger than we allow, but still causes atoms to be duplicated, using 2x the RAM
  • a large tree with small (different) atoms, with no deduplication opportunities. Computing the tree hash would cause an (almost) 32x memory usage, assuming I understand correctly that every node's tree hash is cached.

Copy link

Copilot AI commented Feb 7, 2026

@richardkiss I've opened a new pull request, #692, to work on those changes. Once the pull request is ready, I'll request review from you.

@richardkiss richardkiss force-pushed the generator-identity-hf branch from aabf999 to fd8698b Compare February 9, 2026 23:10
@danieljperry danieljperry changed the title Add interning API. [CHIA-3823] Add interning API. Feb 12, 2026
@arvidn
Copy link
Contributor

arvidn commented Feb 12, 2026

sorry, I broke this by changing some names. atom_count() and pair_count() are now the total number, including "ghost" ones. These are the counters that are constrained by the limits.

Now you can also ask for allocated_atom_count() and allocated_pair_count() which tells you how much RAM we're using. These counters do not affect consensus.

Copy link
Contributor

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind adding a benchmark for intern() as well?
we have a few generators that we benchmark treehash on, you could use those.

I don't see any major problems with this

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@richardkiss richardkiss force-pushed the generator-identity-hf branch 2 times, most recently from cc35f6b to 55f1753 Compare February 16, 2026 23:32
arvidn
arvidn previously approved these changes Feb 17, 2026
@richardkiss richardkiss force-pushed the generator-identity-hf branch from 3251a4e to 3a520de Compare February 18, 2026 00:05
@richardkiss richardkiss merged commit f80ab73 into main Feb 19, 2026
32 checks passed
@richardkiss richardkiss deleted the generator-identity-hf branch February 19, 2026 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants