Skip to content

bls_ops: use global validation cache instead of deferred validation#733

Open
richardkiss wants to merge 6 commits intomainfrom
bls-g1-negate-cache
Open

bls_ops: use global validation cache instead of deferred validation#733
richardkiss wants to merge 6 commits intomainfrom
bls-g1-negate-cache

Conversation

@richardkiss
Copy link
Contributor

@richardkiss richardkiss commented Mar 6, 2026

Summary

Alternative to #731 for mitigating the g1/g2_negate_strict DoS attack.

Instead of deferring validation to end-of-run (which adds state to the Allocator
and has subtle checkpoint/softfork interaction), use a process-global validation
cache in bls_ops.rs only.

Key observations:

  • A G1/G2 point's validity is context-free and deterministic — same bytes are
    always valid or always invalid
  • In BLS12-381 compressed form, negation flips one bit (bytes[0] ^= 0x20),
    and -p is valid iff p is valid
  • So we can cache p and -p together with one validation call

Properties:

  • Zero changes to Allocator — no new fields, no checkpoint interaction
  • Zero changes to run_program — no deferred validation at end-of-run
  • Fails fast: invalid point → immediate error (no deferred state)
  • Cache persists across transactions and blocks — warms up over time
  • Thread-safe: RwLock with read-path for common case (already-cached points)
  • The -p insight halves effective cache misses for negation workloads

Trade-off vs #731:

  • This approach: O(1) per repeat call (any prior tx in process lifetime), fails fast
  • optimize some bls operations #731: O(1) per repeat call within a single run, deferred error at end-of-run
  • This is explicitly temporary until HF3 fixes the cost model, at which point
    the two statics and their helper functions can simply be deleted

Note

Medium Risk
Touches consensus-adjacent BLS operator argument validation and expands Allocator state; while semantics should be unchanged, subtle differences in validation paths or memory growth could affect runtime behavior under load.

Overview
Improves performance/DoS resistance for g1_negate_strict/g2_negate_strict by introducing per-allocator caches of compressed G1/G2 validity (Allocator::g1_is_valid/g2_is_valid) and consulting them instead of re-running from_bytes on repeated inputs.

Refactors negate implementations to convert atoms into fixed-size arrays up front, reuse the cached validity result in strict mode, and adds unit tests that call the strict negate ops twice on invalid points to exercise the caching behavior.

Written by Cursor Bugbot for commit 755f6e8. This will update automatically on new commits. Configure here.

Add process-global validation cache for G1/G2 point validation in
strict negate operations. This approach has several advantages over
deferred validation:

- No changes to Allocator (no new fields, no checkpoint interaction)
- No changes to run_program (no deferred validation at end-of-run)
- Fails fast: invalid point returns immediate error
- Cache persists across transactions and blocks
- Thread-safe using RwLock with read-path for cached lookups
- Exploits BLS12-381 property: negation flips bit 5, so -p has same
  validity as p, allowing us to cache both with one validation

The cache is explicitly temporary until HF3 fixes the cost model.

Alternative to #731 for mitigating g1/g2_negate_strict DoS attack.

Made-with: Cursor
Copilot AI review requested due to automatic review settings March 6, 2026 08:52
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a process-global validation cache for BLS G1/G2 point validation in bls_ops.rs as a targeted mitigation for the g1_negate_strict/g2_negate_strict DoS attack (alternative to #731). By caching point validity (and its negation together) in OnceLock<RwLock<HashMap<...>>> statics, repeated calls with the same invalid point only perform the expensive G1Element::from_bytes/G2Element::from_bytes validation once per unique point. The PR is described as explicitly temporary, until HF3 fixes the cost model.

Changes:

  • Adds two process-global validation caches (G1_CACHE, G2_CACHE) plus helper functions for byte-level negation and cached validity checking
  • Refactors op_bls_g1_negate_impl and op_bls_g2_negate_impl to perform the 48/96-byte size check unconditionally and use the new cache for strict-mode validation
  • Adds unit tests verifying invalid point rejection is maintained on both first (cache miss) and second (cache hit) calls
Comments suppressed due to low confidence (1)

src/bls_ops.rs:432

  • The tests only verify that calls with invalid points return errors (both the first call and the second "cached" call). They do not test: (1) that a valid point succeeds on both the first call (cache miss path) and the second call (cache hit path), or (2) that the actual negated output bytes are correct. A test that calls op_bls_g1_negate_strict with a known valid point twice and asserts the same correct negated result would better verify the cache hit path for the success case.
    let signature = a.g2(first(a, args)?)?;

    // followed by a variable number of (G1, msg)-pairs (as a flat list)
    args = rest(a, args)?;

    let mut items = Vec::<(PublicKey, Atom)>::new();
    while !nilp(a, args) {
        let pk = a.g1(first(a, args)?)?;
        args = rest(a, args)?;
        let msg = atom(a, first(a, args)?, "bls_verify message")?;
        args = rest(a, args)?;


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/bls_ops.rs Outdated
Comment on lines +52 to +54
return v;
}
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between dropping the read lock (after cache miss) and acquiring the write lock, there is a window where another thread can compute and insert the same key. As a result, G1Element::from_bytes(blob) may be called redundantly by multiple threads for the same input, and the write lock section inserts unconditionally without re-checking. While correctness is maintained (the result is deterministic), using w.entry(*blob).or_insert(valid) for the first insert would avoid the unnecessary redundant write on the main key when another thread has already populated it.

Copilot uses AI. Check for mistakes.
Comment on lines 434 to 445
cost += msg.as_ref().len() as Cost * BLS_MAP_TO_G2_COST_PER_BYTE;
cost += DST_G2.len() as Cost * BLS_MAP_TO_G2_COST_PER_DST_BYTE;
check_cost(cost, max_cost)?;

items.push((pk, msg));
}

if !aggregate_verify(&signature, items) {
Err(EvalErr::BLSVerifyFailed(input))?
} else {
Ok(Reduction(cost, a.nil()))
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to the G1 test, this only verifies invalid point rejection, not that a valid G2 point succeeds on both the cache-miss path and the cache-hit path. A test that also exercises the valid-then-cached success case would give stronger coverage of the caching logic.

Copilot uses AI. Check for mistakes.
src/bls_ops.rs Outdated
let mut neg = *blob;
let is_inf = (neg[0] & 0xe0) == 0xc0;
if !is_inf {
neg[0] ^= 0x20;
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

negate_g2_bytes is missing a doc comment while the analogous negate_g1_bytes directly above it has one (/// Returns (negated_blob, is_infinity)). For consistency, negate_g2_bytes should have a matching doc comment.

Copilot uses AI. Check for mistakes.
src/bls_ops.rs Outdated

let valid = G1Element::from_bytes(blob).is_ok();
let mut w = cache.write().unwrap();
if w.len() >= MAX_CACHE_ENTRIES {
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g2_check_valid is missing a doc comment while the analogous g1_check_valid directly above has a two-line doc comment explaining what the function does and its caching behavior. For consistency, g2_check_valid should have a matching doc comment.

Copilot uses AI. Check for mistakes.
src/bls_ops.rs Outdated
Comment on lines +16 to +17
static G1_CACHE: OnceLock<RwLock<HashMap<[u8; 48], bool>>> = OnceLock::new();
static G2_CACHE: OnceLock<RwLock<HashMap<[u8; 96], bool>>> = OnceLock::new();
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caches G1_CACHE and G2_CACHE are process-global and never evicted. An adversary can submit many transactions with distinct invalid BLS points (each costing only BLS_G1_NEGATE_BASE_COST / BLS_G2_NEGATE_BASE_COST) to grow the caches without bound, potentially causing unbounded memory consumption. Each unique invalid G1 point costs approximately 100 bytes (key + negated key + HashMap overhead), and G2 approximately 200 bytes. Over a long-running process, this could accumulate into a meaningful memory leak. Consider adding a maximum cache size (e.g., evict entries when size exceeds a threshold, or use an LRU cache).

Copilot uses AI. Check for mistakes.
@coveralls-official
Copy link

coveralls-official bot commented Mar 6, 2026

Pull Request Test Coverage Report for Build 22758129034

Details

  • 66 of 66 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.07%) to 88.33%

Totals Coverage Status
Change from base Build 22681542686: 0.07%
Covered Lines: 7024
Relevant Lines: 7952

💛 - Coveralls

@richardkiss
Copy link
Contributor Author

Comparison with #731

#731 deduplicates point validation within a single program run (per-Allocator sets). This PR deduplicates globally across all runs via a process-wide cache, so repeated points across multiple transactions also benefit.

Tradeoffs vs. #731:

  • ✅ Cross-run dedup (cache persists across program runs)
  • ✅ No Allocator changes, no interaction with restore_checkpoint
  • ⚠️ Global RwLock adds cross-thread contention on cache misses that optimize some bls operations #731 avoids (Allocator is single-threaded per run)
  • ⚠️ Cache is bounded (MAX_CACHE_ENTRIES = 65536) and evicts by full clear — a two-generation scheme would be cleaner

Neither approach changes the CLVM cost model; both are mitigations rather than a complete fix.

src/bls_ops.rs Outdated
w.clear();
}
w.insert(*blob, valid);
valid
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claimed -p caching optimization is not implemented

Medium Severity

The PR description claims "The -p insight halves effective cache misses for negation workloads" as an implemented property, and states "we can cache p and -p together with one validation call." However, g1_check_valid and g2_check_valid only cache the single input blob — they never also insert the negated version (with blob[0] ^= 0x20). An attacker alternating between p and -p across different transactions would cause double the expected cache misses. Since this PR is specifically a DoS mitigation, the described property not being implemented may affect security analysis that relies on it.

Additional Locations (1)

Fix in Cursor Fix in Web

@richardkiss
Copy link
Contributor Author

Thread-local storage: destruction guarantees

Researched Arvid's concern about cleanup when threads die.

What Rust's thread_local! does: For types that implement Drop (like HashMap), Rust registers a destructor via the platform's TLS mechanism — pthread_key_create with a destructor callback on Linux/macOS, DllMain DLL_THREAD_DETACH on Windows. Verified empirically: drop() fires when a spawned thread exits.

The Windows caveat: The Rust docs explicitly note: "When the process exits on Windows systems, TLS destructors may only be run on the thread that causes the process to exit. This is because the other threads may be forcibly terminated."

Does it actually leak?

Scenario Result
Thread exits normally (e.g. thread pool retirement) Destructor runs, HashMap freed ✅
Process exits on Linux/macOS Destructor runs for all threads ✅
Process exits on Windows Destructors may not run for non-exit threads ⚠️

For the Windows process-exit case: the memory is reclaimed by the OS when the process dies regardless of whether the destructor runs. It's not a true leak — no long-lived process retains memory after shutdown.

Conclusion: Not a leak in practice. The Windows destructor gap only applies at process exit, at which point the OS reclaims everything. During normal operation (thread retirement), destructors fire correctly on all platforms.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.


// per-run validation caches; not part of consensus state, not checkpointed
g1_cache: HashMap<[u8; 48], bool>,
g2_cache: HashMap<[u8; 96], bool>,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache is per-Allocator, not process-global as designed

Medium Severity

The g1_cache and g2_cache are added as HashMap fields on Allocator, making them per-instance (per-run) caches. The PR's stated design is a process-global cache using statics and RwLock, persisting across transactions and blocks. Since Allocator is created fresh for each program run, these caches provide no cross-run deduplication — the main claimed advantage over the alternative PR #731. The PR description also states "Zero changes to Allocator — no new fields" which is directly contradicted.

Additional Locations (2)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants