Skip to content

CLVM Garbage collection#697

Open
arvidn wants to merge 4 commits intomainfrom
garbage-collection
Open

CLVM Garbage collection#697
arvidn wants to merge 4 commits intomainfrom
garbage-collection

Conversation

@arvidn
Copy link
Contributor

@arvidn arvidn commented Feb 11, 2026

This PR implements a simple, call stack-based, garbage collection.

overview

The observation is that any operator that returns an atom that was allocated before the snapshot was taken (i.e. picking an atom out of the environment) or a simple atom. e.g. NIL or 1 effectively "launder" all computation (and allocations) that went into computing its arguments into a very small amount of data.

When this happens we can reset the Allocator to the state it was in before we invoked the operator, as long as we preserve the return value.

This requires us to pre-emptively store Allocator checkpoints just in case the operator returns a simple atom. This adds overhead and we'll have to make a judgement call on whether we think it's worth it.

The cost of recording Allocator snapshots has been mitigated by:

  1. creating a new subset of the existing Checkpoint, called TransparentCheckpoint that's smaller. It uses u32 instead of usize and it does not record the "ghost" counters.
  2. Only operators that are likely to return a small atom trigger this behavior. See the list in src/chia_dialect.rs in gc_candidate().

restoring the Allocator

When restoring the allocator, it's important that we preserve the same behavior as we have today, with regards to counters. The atom- and pair counters are consensus critical, since we have upper limits on them. So when we restore the allocator, we have to do it transparently, and preserve counters.

The way this is done is to increment the ghost-counters by the same amount as we are freeing. This acts as if the atoms and pairs were never freed, just like it works today.

This is one important distinction between the existing checkpoint() -> restore_checkpoint() functions and the new transparent_checkpoint() -> restore_transparent_checkpoint() functions. But note that the transparent version is a subset of the existing behavior.

benchmarking

To benchmark this I picked some of the most expensive block generators from mainnet. Expensive in the number of atoms, pairs and heap-bytes allocated, but also in execution run-time.

Extending the analyze-chain.rs tool, I picked the generators at the following heights:

size (bytes) height size (bytes) height size (bytes) height
3405 6870373 2785 6874452 15669 6945939
2797 6870530 3365 6944847 372 6946441
7719 6870593 372 6944939 7175 6946468
372 6871396 6128 6945282 12927 6946540
12797 6872502 820 6945355 6135 6946609
786 6872582 3954 6945434 15741 6946951
5537 6874073 6193 6945560 103269 7521791

running these generators before and after this change gave me the following results:

comparison

run-time

run-time

peak atom count

atom-count

peak pair count

pair-count

peak heap size

heap-size

before (main)

generator run-time peak atom count peak pair count peak heap size
"6870373.generator" 2102ms 5414280 62455866 101548705
"6870530.generator" 2062ms 5415071 62465191 101566177
"6870593.generator" 2083ms 5396291 62256998 101240601
"6871396.generator" 2016ms 5221443 60235205 97877640
"6872502.generator" 2068ms 5360152 61859573 100597672
"6872582.generator" 2042ms 5363727 61871629 100581312
"6874073.generator" 2070ms 5371395 62247500 99746106
"6874452.generator" 2076ms 5398433 62272341 101245881
"6944847.generator" 1953ms 5098731 58827020 95553157
"6944939.generator" 2099ms 5390395 62178153 101087728
"6945282.generator" 2075ms 5402757 62330453 101352611
"6945355.generator" 2017ms 5204922 60045427 97564160
"6945434.generator" 2058ms 5334643 61539612 100035288
"6945560.generator" 2153ms 5410306 62413293 101483135
"6945939.generator" 2139ms 5354598 61790505 100478739
"6946441.generator" 2133ms 5415573 62467700 101566110
"6946468.generator" 1985ms 5116698 59040543 95914088
"6946540.generator" 2097ms 5381551 62106234 101024076
"6946609.generator" 2085ms 5388235 62157068 101057726
"6946951.generator" 2102ms 5390737 62196126 101131147
"7521791.generator" 2090ms 5394436 62227626 101271468

after (with this change)

generator run-time peak atom count peak pair count peak heap size
"6870373.generator" 2271ms 3068638 35384443 57639326
"6870530.generator" 2209ms 2843748 32799191 53367208
"6870593.generator" 2206ms 3041519 35073893 57125433
"6871396.generator" 2106ms 2757812 31809255 51733555
"6872502.generator" 2180ms 2685980 30985197 50370449
"6872582.generator" 2247ms 3156080 36389393 59300689
"6874073.generator" 2243ms 1119555 12970400 20784629
"6874452.generator" 2190ms 2699948 31143380 50634630
"6944847.generator" 2100ms 2883567 33256134 54122964
"6944939.generator" 2194ms 2944054 33951038 55272153
"6945282.generator" 2234ms 2756918 31799237 51717246
"6945355.generator" 2166ms 3060137 35286094 57477863
"6945434.generator" 2170ms 3095343 35691477 58146802
"6945560.generator" 2213ms 2926020 33744017 54930316
"6945939.generator" 2188ms 3071640 35419699 57696713
"6946441.generator" 2219ms 3026250 34896292 56833877
"6946468.generator" 2095ms 2855362 32932679 53587126
"6946540.generator" 2193ms 2967605 34226693 55724509
"6946609.generator" 2241ms 2723579 31416637 51083258
"6946951.generator" 2211ms 3137615 36181623 58956820
"7521791.generator" 2292ms 3052172 35195080 57426349

Note

High Risk
High risk because it changes core run_program/allocator behavior and introduces checkpoint/restore paths that affect consensus-critical allocation accounting, even though gated behind the new ClvmFlags::ENABLE_GC flag.

Overview
Adds an optional, flag-gated garbage-collection mechanism that snapshots the Allocator before selected operators and may transparently restore it after the operator returns, reducing retained allocations while preserving atom/pair/heap accounting via ghost counters.

This introduces TransparentCheckpoint plus maybe_restore_with_node() in Allocator, wires a new Dialect::gc_candidate() hook (implemented in ChiaDialect under ClvmFlags::ENABLE_GC) into run_program via a new RestoreAllocator operation, and updates benchmarks/tools/tests to run with ENABLE_GC plus new fuzz coverage comparing ENABLE_GC vs non-GC outcomes.

Written by Cursor Bugbot for commit 57366f9. This will update automatically on new commits. Configure here.

@arvidn arvidn force-pushed the garbage-collection branch 2 times, most recently from caf4dea to 9df4a48 Compare February 11, 2026 11:44
@coveralls-official
Copy link

coveralls-official bot commented Feb 11, 2026

Pull Request Test Coverage Report for Build 23138661607

Details

  • 205 of 256 (80.08%) changed or added relevant lines in 6 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.2%) to 87.554%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tools/src/bin/sha256tree-benching.rs 0 1 0.0%
src/runtime_dialect.rs 0 3 0.0%
src/run_program.rs 21 30 70.0%
clvm-fuzzing/src/node_eq.rs 0 19 0.0%
src/allocator.rs 176 195 90.26%
Totals Coverage Status
Change from base Build 23076772943: -0.2%
Covered Lines: 7436
Relevant Lines: 8493

💛 - Coveralls

@arvidn arvidn marked this pull request as draft February 11, 2026 13:50
@arvidn arvidn force-pushed the garbage-collection branch 3 times, most recently from f641887 to 99ddcd5 Compare February 13, 2026 16:38
@arvidn arvidn closed this Feb 16, 2026
@arvidn arvidn reopened this Feb 16, 2026
@arvidn arvidn force-pushed the garbage-collection branch 4 times, most recently from 73275d0 to d9e0b3e Compare February 19, 2026 11:46
@arvidn arvidn marked this pull request as ready for review February 19, 2026 14:36
@arvidn
Copy link
Contributor Author

arvidn commented Feb 19, 2026

@cursor review

@arvidn arvidn force-pushed the garbage-collection branch from d9e0b3e to 0690684 Compare February 20, 2026 08:53
@arvidn arvidn force-pushed the garbage-collection branch from 0690684 to a870ef1 Compare February 20, 2026 14:49
@arvidn arvidn force-pushed the garbage-collection branch 3 times, most recently from 45db360 to 2118254 Compare March 2, 2026 21:13
@arvidn arvidn force-pushed the garbage-collection branch 2 times, most recently from 6dcb742 to fc386c2 Compare March 2, 2026 22:28
@arvidn arvidn force-pushed the garbage-collection branch from fc386c2 to d9b6fa5 Compare March 3, 2026 06:30
@arvidn arvidn force-pushed the garbage-collection branch 2 times, most recently from 6b54329 to a2c5124 Compare March 5, 2026 00:52
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

}
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ghost_heap change in new_atom affects all code paths

Medium Severity

Adding self.ghost_heap += v.len() in new_atom for small atoms changes heap_size() and the heap limit check (start as usize + self.ghost_heap + v.len() > self.heap_limit) for ALL callers unconditionally, not just when ENABLE_GC is set. Previously new_atom did not track ghost_heap for small atoms while new_small_number did — this fixes that inconsistency, but it changes the observable heap_size() and makes the OOM check slightly more restrictive for every program. Since heap limits are consensus-critical (especially under LIMIT_HEAP / MEMPOOL_MODE), programs near the limit could now fail where they previously succeeded, potentially causing a consensus divergence between nodes running old vs. new code.

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LIMIT_HEAP and MEMPOOL_MODE are, by definition, not consensus mode.

arvidn added 4 commits March 16, 2026 11:15
…ply invocation results in a simple atom being returned. Everything allocated by the invocation can be freed
@arvidn arvidn force-pushed the garbage-collection branch from a2c5124 to 57366f9 Compare March 16, 2026 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant