bls_ops: use global validation cache instead of deferred validation by richardkiss · Pull Request #733 · Chia-Network/clvm_rs

richardkiss · 2026-03-06T08:52:01Z

Summary

Alternative to #731 for mitigating the g1/g2_negate_strict DoS attack.

Instead of deferring validation to end-of-run (which adds state to the Allocator
and has subtle checkpoint/softfork interaction), use a process-global validation
cache in bls_ops.rs only.

Key observations:

A G1/G2 point's validity is context-free and deterministic — same bytes are
always valid or always invalid
In BLS12-381 compressed form, negation flips one bit (bytes[0] ^= 0x20),
and -p is valid iff p is valid
So we can cache p and -p together with one validation call

Properties:

Zero changes to Allocator — no new fields, no checkpoint interaction
Zero changes to run_program — no deferred validation at end-of-run
Fails fast: invalid point → immediate error (no deferred state)
Cache persists across transactions and blocks — warms up over time
Thread-safe: RwLock with read-path for common case (already-cached points)
The -p insight halves effective cache misses for negation workloads

Trade-off vs #731:

This approach: O(1) per repeat call (any prior tx in process lifetime), fails fast
optimize some bls operations #731: O(1) per repeat call within a single run, deferred error at end-of-run
This is explicitly temporary until HF3 fixes the cost model, at which point
the two statics and their helper functions can simply be deleted

Note

Medium Risk
Touches consensus-adjacent BLS operator argument validation and expands Allocator state; while semantics should be unchanged, subtle differences in validation paths or memory growth could affect runtime behavior under load.

Overview
Improves performance/DoS resistance for g1_negate_strict/g2_negate_strict by introducing per-allocator caches of compressed G1/G2 validity (Allocator::g1_is_valid/g2_is_valid) and consulting them instead of re-running from_bytes on repeated inputs.

Refactors negate implementations to convert atoms into fixed-size arrays up front, reuse the cached validity result in strict mode, and adds unit tests that call the strict negate ops twice on invalid points to exercise the caching behavior.

^{Written by Cursor Bugbot for commit 755f6e8. This will update automatically on new commits. Configure here.}

Add process-global validation cache for G1/G2 point validation in strict negate operations. This approach has several advantages over deferred validation: - No changes to Allocator (no new fields, no checkpoint interaction) - No changes to run_program (no deferred validation at end-of-run) - Fails fast: invalid point returns immediate error - Cache persists across transactions and blocks - Thread-safe using RwLock with read-path for cached lookups - Exploits BLS12-381 property: negation flips bit 5, so -p has same validity as p, allowing us to cache both with one validation The cache is explicitly temporary until HF3 fixes the cost model. Alternative to #731 for mitigating g1/g2_negate_strict DoS attack. Made-with: Cursor

Made-with: Cursor

Copilot

Pull request overview

This PR introduces a process-global validation cache for BLS G1/G2 point validation in bls_ops.rs as a targeted mitigation for the g1_negate_strict/g2_negate_strict DoS attack (alternative to #731). By caching point validity (and its negation together) in OnceLock<RwLock<HashMap<...>>> statics, repeated calls with the same invalid point only perform the expensive G1Element::from_bytes/G2Element::from_bytes validation once per unique point. The PR is described as explicitly temporary, until HF3 fixes the cost model.

Changes:

Adds two process-global validation caches (G1_CACHE, G2_CACHE) plus helper functions for byte-level negation and cached validity checking
Refactors op_bls_g1_negate_impl and op_bls_g2_negate_impl to perform the 48/96-byte size check unconditionally and use the new cache for strict-mode validation
Adds unit tests verifying invalid point rejection is maintained on both first (cache miss) and second (cache hit) calls

Comments suppressed due to low confidence (1)

src/bls_ops.rs:432

The tests only verify that calls with invalid points return errors (both the first call and the second "cached" call). They do not test: (1) that a valid point succeeds on both the first call (cache miss path) and the second call (cache hit path), or (2) that the actual negated output bytes are correct. A test that calls op_bls_g1_negate_strict with a known valid point twice and asserts the same correct negated result would better verify the cache hit path for the success case.

    let signature = a.g2(first(a, args)?)?;

    // followed by a variable number of (G1, msg)-pairs (as a flat list)
    args = rest(a, args)?;

    let mut items = Vec::<(PublicKey, Atom)>::new();
    while !nilp(a, args) {
        let pk = a.g1(first(a, args)?)?;
        args = rest(a, args)?;
        let msg = atom(a, first(a, args)?, "bls_verify message")?;
        args = rest(a, args)?;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T09:01:26Z

src/bls_ops.rs

+            return v;
+        }
+    }


Between dropping the read lock (after cache miss) and acquiring the write lock, there is a window where another thread can compute and insert the same key. As a result, G1Element::from_bytes(blob) may be called redundantly by multiple threads for the same input, and the write lock section inserts unconditionally without re-checking. While correctness is maintained (the result is deterministic), using w.entry(*blob).or_insert(valid) for the first insert would avoid the unnecessary redundant write on the main key when another thread has already populated it.

Copilot · 2026-03-06T09:01:26Z

src/bls_ops.rs

        cost += msg.as_ref().len() as Cost * BLS_MAP_TO_G2_COST_PER_BYTE;
        cost += DST_G2.len() as Cost * BLS_MAP_TO_G2_COST_PER_DST_BYTE;
        check_cost(cost, max_cost)?;

        items.push((pk, msg));
    }

    if !aggregate_verify(&signature, items) {
        Err(EvalErr::BLSVerifyFailed(input))?
    } else {
        Ok(Reduction(cost, a.nil()))
    }


Similarly to the G1 test, this only verifies invalid point rejection, not that a valid G2 point succeeds on both the cache-miss path and the cache-hit path. A test that also exercises the valid-then-cached success case would give stronger coverage of the caching logic.

Copilot · 2026-03-06T09:01:27Z

src/bls_ops.rs

+    let mut neg = *blob;
+    let is_inf = (neg[0] & 0xe0) == 0xc0;
+    if !is_inf {
+        neg[0] ^= 0x20;


negate_g2_bytes is missing a doc comment while the analogous negate_g1_bytes directly above it has one (/// Returns (negated_blob, is_infinity)). For consistency, negate_g2_bytes should have a matching doc comment.

Copilot · 2026-03-06T09:01:27Z

src/bls_ops.rs

+
+    let valid = G1Element::from_bytes(blob).is_ok();
+    let mut w = cache.write().unwrap();
+    if w.len() >= MAX_CACHE_ENTRIES {


g2_check_valid is missing a doc comment while the analogous g1_check_valid directly above has a two-line doc comment explaining what the function does and its caching behavior. For consistency, g2_check_valid should have a matching doc comment.

Copilot · 2026-03-06T09:01:27Z

src/bls_ops.rs

+static G1_CACHE: OnceLock<RwLock<HashMap<[u8; 48], bool>>> = OnceLock::new();
+static G2_CACHE: OnceLock<RwLock<HashMap<[u8; 96], bool>>> = OnceLock::new();


The caches G1_CACHE and G2_CACHE are process-global and never evicted. An adversary can submit many transactions with distinct invalid BLS points (each costing only BLS_G1_NEGATE_BASE_COST / BLS_G2_NEGATE_BASE_COST) to grow the caches without bound, potentially causing unbounded memory consumption. Each unique invalid G1 point costs approximately 100 bytes (key + negated key + HashMap overhead), and G2 approximately 200 bytes. Over a long-running process, this could accumulate into a meaningful memory leak. Consider adding a maximum cache size (e.g., evict entries when size exceeds a threshold, or use an LRU cache).

coveralls-official · 2026-03-06T09:02:51Z

Pull Request Test Coverage Report for Build 22758129034

Details

66 of 66 (100.0%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.07%) to 88.33%

Totals
Change from base Build 22681542686:	0.07%
Covered Lines:	7024
Relevant Lines:	7952

💛 - Coveralls

src/bls_ops.rs

richardkiss · 2026-03-06T09:04:24Z

Comparison with #731

#731 deduplicates point validation within a single program run (per-Allocator sets). This PR deduplicates globally across all runs via a process-wide cache, so repeated points across multiple transactions also benefit.

Tradeoffs vs. #731:

✅ Cross-run dedup (cache persists across program runs)
✅ No Allocator changes, no interaction with restore_checkpoint
⚠️ Global RwLock adds cross-thread contention on cache misses that optimize some bls operations #731 avoids (Allocator is single-threaded per run)
⚠️ Cache is bounded (MAX_CACHE_ENTRIES = 65536) and evicts by full clear — a two-generation scheme would be cleaner

Neither approach changes the CLVM cost model; both are mitigations rather than a complete fix.

Made-with: Cursor

cursor · 2026-03-06T09:11:08Z

src/bls_ops.rs

+        w.clear();
+    }
+    w.insert(*blob, valid);
+    valid


Claimed -p caching optimization is not implemented

Medium Severity

The PR description claims "The -p insight halves effective cache misses for negation workloads" as an implemented property, and states "we can cache p and -p together with one validation call." However, g1_check_valid and g2_check_valid only cache the single input blob — they never also insert the negated version (with blob[0] ^= 0x20). An attacker alternating between p and -p across different transactions would cause double the expected cache misses. Since this PR is specifically a DoS mitigation, the described property not being implemented may affect security analysis that relies on it.

Additional Locations (1)

src/bls_ops.rs#L40-L55

Made-with: Cursor

richardkiss · 2026-03-06T09:24:27Z

Thread-local storage: destruction guarantees

Researched Arvid's concern about cleanup when threads die.

What Rust's thread_local! does: For types that implement Drop (like HashMap), Rust registers a destructor via the platform's TLS mechanism — pthread_key_create with a destructor callback on Linux/macOS, DllMain DLL_THREAD_DETACH on Windows. Verified empirically: drop() fires when a spawned thread exits.

The Windows caveat: The Rust docs explicitly note: "When the process exits on Windows systems, TLS destructors may only be run on the thread that causes the process to exit. This is because the other threads may be forcibly terminated."

Does it actually leak?

Scenario	Result
Thread exits normally (e.g. thread pool retirement)	Destructor runs, HashMap freed ✅
Process exits on Linux/macOS	Destructor runs for all threads ✅
Process exits on Windows	Destructors may not run for non-exit threads ⚠️

For the Windows process-exit case: the memory is reclaimed by the OS when the process dies regardless of whether the destructor runs. It's not a true leak — no long-lived process retains memory after shutdown.

Conclusion: Not a leak in practice. The Windows destructor gap only applies at process exit, at which point the OS reclaims everything. During normal operation (thread retirement), destructors fire correctly on all platforms.

Made-with: Cursor

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-06T09:53:56Z

src/allocator.rs

+
+    // per-run validation caches; not part of consensus state, not checkpointed
+    g1_cache: HashMap<[u8; 48], bool>,
+    g2_cache: HashMap<[u8; 96], bool>,


Cache is per-Allocator, not process-global as designed

Medium Severity

The g1_cache and g2_cache are added as HashMap fields on Allocator, making them per-instance (per-run) caches. The PR's stated design is a process-global cache using statics and RwLock, persisting across transactions and blocks. Since Allocator is created fresh for each program run, these caches provide no cross-run deduplication — the main claimed advantage over the alternative PR #731. The PR description also states "Zero changes to Allocator — no new fields" which is directly contradicted.

Additional Locations (2)

src/allocator.rs#L355-L357

src/allocator.rs#L967-L988

Copilot AI review requested due to automatic review settings March 6, 2026 08:52

Copilot started reviewing on behalf of richardkiss March 6, 2026 08:52 View session

richardkiss added 2 commits March 6, 2026 00:53

cargo fmt

421ac8d

add cache eviction and restore original code structure

a0ff0b1

Made-with: Cursor

Copilot AI reviewed Mar 6, 2026

View reviewed changes

cursor bot reviewed Mar 6, 2026

View reviewed changes

src/bls_ops.rs Outdated Show resolved Hide resolved

simplify: remove negation caching

1ec35f6

Made-with: Cursor

cursor bot reviewed Mar 6, 2026

View reviewed changes

use thread_local! cache instead of global RwLock

0fa7ec8

Made-with: Cursor

move validation cache onto Allocator struct

755f6e8

Made-with: Cursor

cursor bot reviewed Mar 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bls_ops: use global validation cache instead of deferred validation#733

bls_ops: use global validation cache instead of deferred validation#733
richardkiss wants to merge 6 commits intomainfrom
bls-g1-negate-cache

richardkiss commented Mar 6, 2026 •

edited by cursor bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

coveralls-official bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

richardkiss commented Mar 6, 2026

Uh oh!

cursor bot Mar 6, 2026

Uh oh!

richardkiss commented Mar 6, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		static G1_CACHE: OnceLock<RwLock<HashMap<[u8; 48], bool>>> = OnceLock::new();
		static G2_CACHE: OnceLock<RwLock<HashMap<[u8; 96], bool>>> = OnceLock::new();

Conversation

richardkiss commented Mar 6, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coveralls-official bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 22758129034

Details

💛 - Coveralls

Uh oh!

Uh oh!

richardkiss commented Mar 6, 2026

Comparison with #731

Uh oh!

cursor bot Mar 6, 2026

Choose a reason for hiding this comment

Claimed -p caching optimization is not implemented

Uh oh!

richardkiss commented Mar 6, 2026

Thread-local storage: destruction guarantees

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 6, 2026

Choose a reason for hiding this comment

Cache is per-Allocator, not process-global as designed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

richardkiss commented Mar 6, 2026 •

edited by cursor bot

Loading

coveralls-official bot commented Mar 6, 2026 •

edited

Loading

Claimed `-p` caching optimization is not implemented