bls_ops: use global validation cache instead of deferred validation#733
bls_ops: use global validation cache instead of deferred validation#733richardkiss wants to merge 6 commits intomainfrom
Conversation
Add process-global validation cache for G1/G2 point validation in strict negate operations. This approach has several advantages over deferred validation: - No changes to Allocator (no new fields, no checkpoint interaction) - No changes to run_program (no deferred validation at end-of-run) - Fails fast: invalid point returns immediate error - Cache persists across transactions and blocks - Thread-safe using RwLock with read-path for cached lookups - Exploits BLS12-381 property: negation flips bit 5, so -p has same validity as p, allowing us to cache both with one validation The cache is explicitly temporary until HF3 fixes the cost model. Alternative to #731 for mitigating g1/g2_negate_strict DoS attack. Made-with: Cursor
Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
This PR introduces a process-global validation cache for BLS G1/G2 point validation in bls_ops.rs as a targeted mitigation for the g1_negate_strict/g2_negate_strict DoS attack (alternative to #731). By caching point validity (and its negation together) in OnceLock<RwLock<HashMap<...>>> statics, repeated calls with the same invalid point only perform the expensive G1Element::from_bytes/G2Element::from_bytes validation once per unique point. The PR is described as explicitly temporary, until HF3 fixes the cost model.
Changes:
- Adds two process-global validation caches (
G1_CACHE,G2_CACHE) plus helper functions for byte-level negation and cached validity checking - Refactors
op_bls_g1_negate_implandop_bls_g2_negate_implto perform the 48/96-byte size check unconditionally and use the new cache for strict-mode validation - Adds unit tests verifying invalid point rejection is maintained on both first (cache miss) and second (cache hit) calls
Comments suppressed due to low confidence (1)
src/bls_ops.rs:432
- The tests only verify that calls with invalid points return errors (both the first call and the second "cached" call). They do not test: (1) that a valid point succeeds on both the first call (cache miss path) and the second call (cache hit path), or (2) that the actual negated output bytes are correct. A test that calls
op_bls_g1_negate_strictwith a known valid point twice and asserts the same correct negated result would better verify the cache hit path for the success case.
let signature = a.g2(first(a, args)?)?;
// followed by a variable number of (G1, msg)-pairs (as a flat list)
args = rest(a, args)?;
let mut items = Vec::<(PublicKey, Atom)>::new();
while !nilp(a, args) {
let pk = a.g1(first(a, args)?)?;
args = rest(a, args)?;
let msg = atom(a, first(a, args)?, "bls_verify message")?;
args = rest(a, args)?;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/bls_ops.rs
Outdated
| return v; | ||
| } | ||
| } |
There was a problem hiding this comment.
Between dropping the read lock (after cache miss) and acquiring the write lock, there is a window where another thread can compute and insert the same key. As a result, G1Element::from_bytes(blob) may be called redundantly by multiple threads for the same input, and the write lock section inserts unconditionally without re-checking. While correctness is maintained (the result is deterministic), using w.entry(*blob).or_insert(valid) for the first insert would avoid the unnecessary redundant write on the main key when another thread has already populated it.
| cost += msg.as_ref().len() as Cost * BLS_MAP_TO_G2_COST_PER_BYTE; | ||
| cost += DST_G2.len() as Cost * BLS_MAP_TO_G2_COST_PER_DST_BYTE; | ||
| check_cost(cost, max_cost)?; | ||
|
|
||
| items.push((pk, msg)); | ||
| } | ||
|
|
||
| if !aggregate_verify(&signature, items) { | ||
| Err(EvalErr::BLSVerifyFailed(input))? | ||
| } else { | ||
| Ok(Reduction(cost, a.nil())) | ||
| } |
There was a problem hiding this comment.
Similarly to the G1 test, this only verifies invalid point rejection, not that a valid G2 point succeeds on both the cache-miss path and the cache-hit path. A test that also exercises the valid-then-cached success case would give stronger coverage of the caching logic.
src/bls_ops.rs
Outdated
| let mut neg = *blob; | ||
| let is_inf = (neg[0] & 0xe0) == 0xc0; | ||
| if !is_inf { | ||
| neg[0] ^= 0x20; |
There was a problem hiding this comment.
negate_g2_bytes is missing a doc comment while the analogous negate_g1_bytes directly above it has one (/// Returns (negated_blob, is_infinity)). For consistency, negate_g2_bytes should have a matching doc comment.
src/bls_ops.rs
Outdated
|
|
||
| let valid = G1Element::from_bytes(blob).is_ok(); | ||
| let mut w = cache.write().unwrap(); | ||
| if w.len() >= MAX_CACHE_ENTRIES { |
There was a problem hiding this comment.
g2_check_valid is missing a doc comment while the analogous g1_check_valid directly above has a two-line doc comment explaining what the function does and its caching behavior. For consistency, g2_check_valid should have a matching doc comment.
src/bls_ops.rs
Outdated
| static G1_CACHE: OnceLock<RwLock<HashMap<[u8; 48], bool>>> = OnceLock::new(); | ||
| static G2_CACHE: OnceLock<RwLock<HashMap<[u8; 96], bool>>> = OnceLock::new(); |
There was a problem hiding this comment.
The caches G1_CACHE and G2_CACHE are process-global and never evicted. An adversary can submit many transactions with distinct invalid BLS points (each costing only BLS_G1_NEGATE_BASE_COST / BLS_G2_NEGATE_BASE_COST) to grow the caches without bound, potentially causing unbounded memory consumption. Each unique invalid G1 point costs approximately 100 bytes (key + negated key + HashMap overhead), and G2 approximately 200 bytes. Over a long-running process, this could accumulate into a meaningful memory leak. Consider adding a maximum cache size (e.g., evict entries when size exceeds a threshold, or use an LRU cache).
Pull Request Test Coverage Report for Build 22758129034Details
💛 - Coveralls |
Comparison with #731#731 deduplicates point validation within a single program run (per-Allocator sets). This PR deduplicates globally across all runs via a process-wide cache, so repeated points across multiple transactions also benefit. Tradeoffs vs. #731:
Neither approach changes the CLVM cost model; both are mitigations rather than a complete fix. |
Made-with: Cursor
src/bls_ops.rs
Outdated
| w.clear(); | ||
| } | ||
| w.insert(*blob, valid); | ||
| valid |
There was a problem hiding this comment.
Claimed -p caching optimization is not implemented
Medium Severity
The PR description claims "The -p insight halves effective cache misses for negation workloads" as an implemented property, and states "we can cache p and -p together with one validation call." However, g1_check_valid and g2_check_valid only cache the single input blob — they never also insert the negated version (with blob[0] ^= 0x20). An attacker alternating between p and -p across different transactions would cause double the expected cache misses. Since this PR is specifically a DoS mitigation, the described property not being implemented may affect security analysis that relies on it.
Additional Locations (1)
Made-with: Cursor
Thread-local storage: destruction guaranteesResearched Arvid's concern about cleanup when threads die. What Rust's The Windows caveat: The Rust docs explicitly note: "When the process exits on Windows systems, TLS destructors may only be run on the thread that causes the process to exit. This is because the other threads may be forcibly terminated." Does it actually leak?
For the Windows process-exit case: the memory is reclaimed by the OS when the process dies regardless of whether the destructor runs. It's not a true leak — no long-lived process retains memory after shutdown. Conclusion: Not a leak in practice. The Windows destructor gap only applies at process exit, at which point the OS reclaims everything. During normal operation (thread retirement), destructors fire correctly on all platforms. |
Made-with: Cursor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
|
|
||
| // per-run validation caches; not part of consensus state, not checkpointed | ||
| g1_cache: HashMap<[u8; 48], bool>, | ||
| g2_cache: HashMap<[u8; 96], bool>, |
There was a problem hiding this comment.
Cache is per-Allocator, not process-global as designed
Medium Severity
The g1_cache and g2_cache are added as HashMap fields on Allocator, making them per-instance (per-run) caches. The PR's stated design is a process-global cache using statics and RwLock, persisting across transactions and blocks. Since Allocator is created fresh for each program run, these caches provide no cross-run deduplication — the main claimed advantage over the alternative PR #731. The PR description also states "Zero changes to Allocator — no new fields" which is directly contradicted.


Summary
Alternative to #731 for mitigating the g1/g2_negate_strict DoS attack.
Instead of deferring validation to end-of-run (which adds state to the Allocator
and has subtle checkpoint/softfork interaction), use a process-global validation
cache in
bls_ops.rsonly.Key observations:
always valid or always invalid
bytes[0] ^= 0x20),and
-pis valid iffpis validpand-ptogether with one validation callProperties:
Allocator— no new fields, no checkpoint interactionrun_program— no deferred validation at end-of-runRwLockwith read-path for common case (already-cached points)-pinsight halves effective cache misses for negation workloadsTrade-off vs #731:
the two statics and their helper functions can simply be deleted
Note
Medium Risk
Touches consensus-adjacent BLS operator argument validation and expands
Allocatorstate; while semantics should be unchanged, subtle differences in validation paths or memory growth could affect runtime behavior under load.Overview
Improves performance/DoS resistance for
g1_negate_strict/g2_negate_strictby introducing per-allocator caches of compressed G1/G2 validity (Allocator::g1_is_valid/g2_is_valid) and consulting them instead of re-runningfrom_byteson repeated inputs.Refactors negate implementations to convert atoms into fixed-size arrays up front, reuse the cached validity result in strict mode, and adds unit tests that call the strict negate ops twice on invalid points to exercise the caching behavior.
Written by Cursor Bugbot for commit 755f6e8. This will update automatically on new commits. Configure here.