Skip to content

[CHIA-3823] rename intern → intern_tree; add intern_tree_with_flags to propagate allocator flags#738

Open
richardkiss wants to merge 5 commits intomainfrom
intern-tree
Open

[CHIA-3823] rename intern → intern_tree; add intern_tree_with_flags to propagate allocator flags#738
richardkiss wants to merge 5 commits intomainfrom
intern-tree

Conversation

@richardkiss
Copy link
Contributor

@richardkiss richardkiss commented Mar 6, 2026

Summary

  • Rename internintern_tree for clarity (intern was ambiguous as verb/noun;
    intern_tree pairs naturally with InternedTree)
  • Add dest: Allocator parameter so callers can supply a heap-limited allocator
    (e.g. make_allocator(flags) in chia_rs) instead of always getting an unlimited one

Motivation

Provide an API to chia_rs so it can intern deserialized generators to canonicalize their cost, which is now based only on counts for interned content.

Migration

All existing callers pass Allocator::new() as dest to preserve previous behavior. (Plus, no one uses this yet.)

Made with Cursor


Note

Medium Risk
Public API rename and new allocator-limiting path could break downstream callers and introduce limit-related failures if misused, though core logic changes are localized to interning.

Overview
Renames the CLVM interning entrypoint from intern to intern_tree and introduces intern_tree_limited(source, node, heap_limit) to build the interned tree into a heap-limited destination allocator (via Allocator::new_limited).

Updates public exports (src/serde/mod.rs), benchmarks, fuzz target, and tests to use the new API while preserving the default behavior via intern_tree (calls the limited version with an effectively-unlimited limit).

Written by Cursor Bugbot for commit 2420910. This will update automatically on new commits. Configure here.

Copilot AI review requested due to automatic review settings March 6, 2026 22:05
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Renames the CLVM tree interning API to be more explicit and extends it to accept a caller-provided destination allocator, enabling heap-limited interning (important for mempool/DoS hardening scenarios like LIMIT_HEAP).

Changes:

  • Rename internintern_tree and update re-exports/call sites.
  • Add dest: Allocator parameter so callers can enforce allocator limits during interning.
  • Update tests, fuzz target, and benchmarks to use the new API.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/serde/intern.rs Renames the function and adds a destination allocator parameter used to allocate the interned tree.
src/serde/mod.rs Updates public re-export from intern to intern_tree.
src/serde/test_intern.rs Updates test helper to call intern_tree(..., Allocator::new()).
fuzz/fuzz_targets/intern.rs Updates fuzz target to use intern_tree with an explicit destination allocator.
benches/intern.rs Updates benchmark to call intern_tree with an explicit destination allocator.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +56 to +60
/// # Errors
///
/// Returns an error if allocator limits are exceeded when creating new nodes.
pub fn intern(allocator: &Allocator, node: NodePtr) -> Result<InternedTree> {
let mut new_allocator = Allocator::new();
pub fn intern_tree(source: &Allocator, node: NodePtr, dest: Allocator) -> Result<InternedTree> {
let mut new_allocator = dest;
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment still implies this function always "build[s] a new allocator" and the error docs mention "allocator limits" generically, but the implementation now uses the caller-provided dest allocator. Please update the docs to describe source vs dest and clarify that allocation-limit errors come from dest when creating interned nodes.

Copilot uses AI. Check for mistakes.
let node = allocator.new_atom(&[1, 2, 3]).unwrap();

let tree = intern(&allocator, node).unwrap();
let tree = intern_tree(&allocator, node, Allocator::new()).unwrap();
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the motivation is enforcing heap limits via a caller-supplied allocator, it would be good to add a regression test that passes Allocator::new_limited(...) as dest and asserts intern_tree fails with OutOfMemory once the limit is exceeded. This ensures the DoS fix remains covered.

Copilot uses AI. Check for mistakes.
src/serde/mod.rs Outdated
pub use identity_hash::RandomState;
pub use incremental::{Serializer, UndoState};
pub use intern::{InternedTree, intern};
pub use intern::{InternedTree, intern_tree};
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serde::intern was previously re-exported as a public API and is now removed/renamed. If this crate intends to avoid breaking downstream users in a patch release, consider keeping a #[deprecated] pub fn intern(...) wrapper that forwards to intern_tree(..., Allocator::new()), or ensure the next release/versioning clearly signals the breaking change.

Suggested change
pub use intern::{InternedTree, intern_tree};
pub use intern::{InternedTree, intern_tree};
#[deprecated(note = "Use serde::intern_tree instead")]
pub use intern::intern_tree as intern;

Copilot uses AI. Check for mistakes.
@coveralls-official
Copy link

coveralls-official bot commented Mar 6, 2026

Pull Request Test Coverage Report for Build 22917665711

Details

  • 19 of 19 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.01%) to 88.142%

Totals Coverage Status
Change from base Build 22870033048: 0.01%
Covered Lines: 7143
Relevant Lines: 8104

💛 - Coveralls

- Rename intern() → intern_tree() for clarity (pairs naturally with InternedTree)
- Add dest: Allocator parameter so callers can supply a heap-limited allocator
  instead of always getting an unlimited one (Allocator::new() internally)
- Update all callers to pass Allocator::new() to preserve existing behavior
- All tests pass (787 tests)

In chia_rs mempool validation, LIMIT_HEAP enforces a 500MB allocator cap.
With this change, chia_rs can pass make_allocator(flags) as dest to enforce
the same limit consistently, preventing DoS via backrefs-expanded generators.

Made-with: Cursor
…tion

Callers that need a heap-limited or otherwise configured destination
allocator (e.g. make_allocator(flags)) can now pass a FnOnce() -> Allocator
factory instead of a pre-constructed Allocator.  The direct intern_tree API
is unchanged.

Made-with: Cursor
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Factory function pattern overcomplicates simple allocator parameter
    • Changed intern_tree_with_factory to accept a direct Allocator and updated the default caller to pass Allocator::new() so no closure factory is required.

Create PR

Or push these changes by commenting:

@cursor push 7473761bf9
Preview (7473761bf9)
diff --git a/src/serde/intern.rs b/src/serde/intern.rs
--- a/src/serde/intern.rs
+++ b/src/serde/intern.rs
@@ -56,15 +56,12 @@
 /// # Errors
 ///
 /// Returns an error if allocator limits are exceeded when creating new nodes.
-pub fn intern_tree_with_factory<F>(
+pub fn intern_tree_with_factory(
     source: &Allocator,
     node: NodePtr,
-    make_dest: F,
-) -> Result<InternedTree>
-where
-    F: FnOnce() -> Allocator,
-{
-    let mut new_allocator = make_dest();
+    dest: Allocator,
+) -> Result<InternedTree> {
+    let mut new_allocator = dest;
     let mut atoms: Vec<NodePtr> = Vec::new();
     let mut pairs: Vec<NodePtr> = Vec::new();
 
@@ -141,7 +138,7 @@
 /// Use `intern_tree_with_factory` when you need a heap-limited or otherwise
 /// configured allocator.
 pub fn intern_tree(source: &Allocator, node: NodePtr) -> Result<InternedTree> {
-    intern_tree_with_factory(source, node, Allocator::new)
+    intern_tree_with_factory(source, node, Allocator::new())
 }
 
 #[cfg(test)]
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

@richardkiss richardkiss requested a review from arvidn March 9, 2026 20:33
Copy link
Contributor

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A malicious spend bundle with a backrefs-expanded generator could build an
arbitrarily large interned tree before cost is checked.

Can you explain how that's possible? When we parse a back-ref serialization, we don't duplicate the trees.

If there is a problem, it seems like a simpler solution would be to parse and intern in a single pass.

pub fn intern_tree_with_factory<F>(
source: &Allocator,
node: NodePtr,
make_dest: F,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like questionable solution. the LIMIT_HEAP flag, that's used in mempool-mode, is specifically not consensus critical. But if the hard fork starts requiring it, you're making the specific limit we use another consensus constant. Even if a new consensus constant that limits the allocator is correct, it would be best to use a separate constant. one can change without breaking consensus, the other cannot.

Replace intern_tree_with_factory(source, node, || make_allocator(flags))
with intern_tree_with_flags(source, node, flags).

The new API is cleaner and constructs the destination allocator internally
based on the LIMIT_HEAP flag (500MB limit when set, unlimited otherwise).

The old factory-based function is kept for backward compatibility but
marked as deprecated.

Made-with: Cursor
@danieljperry danieljperry changed the title rename intern → intern_tree, add dest allocator parameter [CHIA-3823] rename intern → intern_tree, add dest allocator parameter Mar 9, 2026
@richardkiss
Copy link
Contributor Author

A malicious spend bundle with a backrefs-expanded generator could build an
arbitrarily large interned tree before cost is checked.

Can you explain how that's possible? When we parse a back-ref serialization, we don't duplicate the trees.

If there is a problem, it seems like a simpler solution would be to parse and intern in a single pass.

That motivation was completely wrong. I hadn't read it until now. I edited it with the real motivation. Sorry about that.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Replace intern_tree_with_factory(source, node, || make_allocator(flags))
with intern_tree_limited(source, node, heap_limit).

The new API is cleaner and doesn't require clvm_rs to know about
policy decisions like the 500MB mempool limit. Callers pass the
heap limit size directly.

- intern_tree(source, node) - unlimited (uses u32::MAX)
- intern_tree_limited(source, node, heap_limit) - custom limit

The old factory-based function is kept for backward compatibility.

Made-with: Cursor
@arvidn
Copy link
Contributor

arvidn commented Mar 10, 2026

the name of this PR still reflects the previous patch. I agree with renaming the function. The motivation for the heap limit isn't really explained. The heap size is already constrained by the input size, how can this ever cause a heap size explosion?

The motivation in the PR description describes the current state. We already have this API.

Provide an API to chia_rs so it can intern deserialized generators to canonicalize their cost, which is now based only on counts for interned content.

Is the issue that after interning, we want to execute the program in mempool mode, and it will use the same allocator?

@richardkiss
Copy link
Contributor Author

the name of this PR still reflects the previous patch. I agree with renaming the function. The motivation for the heap limit isn't really explained. The heap size is already constrained by the input size, how can this ever cause a heap size explosion?

The motivation in the PR description describes the current state. We already have this API.

Provide an API to chia_rs so it can intern deserialized generators to canonicalize their cost, which is now based only on counts for interned content.

Is the issue that after interning, we want to execute the program in mempool mode, and it will use the same allocator?

The intern_tree call creates a new allocator, and run_block_generator2 takes flags that can affect the allocator's limits, so we need to pass those flags along. That's the whole, and only, point of this change.

Made-with: Cursor
@richardkiss richardkiss changed the title [CHIA-3823] rename intern → intern_tree, add dest allocator parameter rename intern → intern_tree; add intern_tree_with_flags to propagate allocator flags Mar 10, 2026
@danieljperry danieljperry changed the title rename intern → intern_tree; add intern_tree_with_flags to propagate allocator flags [CHIA-3823] rename intern → intern_tree; add intern_tree_with_flags to propagate allocator flags Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants