feat: partial sig with fallback by shane-moore · Pull Request #890 · sigp/anchor

shane-moore · 2026-03-17T00:04:36Z

Problem, Evidence, and Context

The signature collector does not verify the combined BLS signature after Lagrange reconstruction, and has no fallback for identifying or recovering from invalid partial signatures.
This is a gap against the SSV spec (boole branch), which defines three functions we were missing:
- ReconstructSignature -> post-reconstruction BLS verification
- FallBackAndVerifyEachSignature -> per-share fallback verification
- resolveDuplicateSignature -> duplicate share handling
Tracks Make signature collector more robust #269

Change Overview

After Lagrange reconstruction, verify the combined signature against the validator's master BLS pubkey
On verification failure, fall back to per-share verification using operator share pubkeys from DB, evict invalid shares, and continue waiting for replacement shares to restore quorum
Add resolveDuplicateSignature handling when conflicting partial sigs arrive from the same operator
Add metrics for signature verification failures and operator evictions
~300 lines of production code, ~700 lines of tests

Risks, Trade-offs, and Mitigations

The fallback path queries the DB via spawn_blocking. This only fires when reconstruction verification fails, which should be rare in normal operation
Duplicate resolution also hits the DB, but the message validator's per-operator rate limit (MAX_MESSAGES_PER_ROUND = 1) bounds this to at most one query per operator per collector instance
No changes to the happy path -> reconstruction success follows the same code path as before

Validation

8 unit tests covering verify_reconstructed_signature, verify_partial_signature, find_invalid_shares, resolve_duplicate_signature, and try_combine_and_verify
4 integration tests: happy path, fallback eviction + recovery, duplicate resolution, and evicted operator rejection

Rollback

Clean revert, no data/schema changes. Reverting removes verification and returns to the prior behavior of trusting all partial signatures.

Additional Info / Next Steps

Tests can be split to a separate PR if preferred; kept here because they validate the core logic.

…lection

claude-code-actions-sigp · 2026-03-17T00:07:36Z

anchor/signature_collector/src/lib.rs

+                    warn!(?operator_id, "Conflicting signature from operator");
+                    match fetch_share_pubkeys_by_validator_index(&database, validator_index).await {
+                        Ok(pubkeys) => {
+                            resolve_duplicate_signature(
+                                &mut signature_share,
+                                operator_id,
+                                &signature,
+                                signing_root,
+                                &pubkeys,
+                            );
+                        }
+                        Err(err) => {


Performance / DoS concern: Every conflicting signature from the same operator triggers a spawn_blocking + DB query. A malicious operator flooding distinct signatures for the same operator_id will hit this path on every message, causing unbounded DB queries.

Consider caching the share_pubkeys lookup result (e.g., in a local variable that persists across loop iterations), so the DB is only queried once per operator conflict:

Suggested change

warn!(?operator_id, "Conflicting signature from operator");

match fetch_share_pubkeys_by_validator_index(&database, validator_index).await {

Ok(pubkeys) => {

resolve_duplicate_signature(

&mut signature_share,

operator_id,

&signature,

signing_root,

&pubkeys,

);

}

Err(err) => {

if received_different_share {

warn!(?operator_id, "Conflicting signature from operator");

// TODO: consider caching share_pubkeys across loop iterations to avoid

// repeated DB queries from a malicious operator sending many distinct sigs

match fetch_share_pubkeys_by_validator_index(&database, validator_index).await {

Ok(pubkeys) => {

resolve_duplicate_signature(

&mut signature_share,

operator_id,

&signature,

signing_root,

&pubkeys,

);

}

Err(err) => {

error!(

?err,

?validator_index,

"DB lookup failed for duplicate resolution"

);

}

}

The message validator enforces MAX_MESSAGES_PER_ROUND = 1 per (operator, slot, kind), so at most ~2 messages from the same operator can reach the collector per slot (accounting for a small race window in the validation queue). This means the duplicate resolution path fires at most once per operator per collector instance -> bounded by committee size (4-13), not by attacker volume. A cache would save a handful of DB calls on an already-rare path so not worth the added complexity.

diegomrsantos · 2026-03-17T00:38:30Z

Could we please use the current PR template for the PRs?

diegomrsantos · 2026-03-17T00:45:19Z

Could we please use the current PR template for the PRs?

It seems we also need to get #870 merged into stable

claude-code-actions-sigp · 2026-03-17T00:47:16Z

anchor/signature_collector/src/lib.rs

        if let Some(threshold) = threshold
            && signature_share.len() as u64 >= threshold
        {
-            let signature = match combine_signatures(mem::take(&mut signature_share)) {
-                Ok(signature) => Arc::new(signature),
-                Err(err) => {
+            let Some(validator_pk) = &validator_pubkey else {
+                error!("No validator pubkey available for verification");
+                return;
+            };
+
+            match try_combine_and_verify(&signature_share, validator_pk, signing_root) {
+                CombineOutcome::Success(signature) => {
+                    trace!(?signature, "Successfully recovered signature");
+                    for notifier in mem::take(&mut notifiers) {
+                        if notifier.send(Arc::clone(&signature)).is_err() {
+                            warn!("Callback dropped since signature is no longer relevant");
+                        }
+                    }
+                    full_signature = Some(signature);
+                }
+                CombineOutcome::CombineFailed(err) => {
                    error!(?err, "Failed to recover signature");
                    return;
                }
-            };
-
-            trace!(?signature, "Successfully recovered signature");
-
-            for notifier in mem::take(&mut notifiers) {
-                if notifier.send(Arc::clone(&signature)).is_err() {
-                    warn!("Callback dropped - signature is no longer relevant");
+                CombineOutcome::VerificationFailed => {
+                    metrics::inc_counter(&metrics::SIGNATURE_VERIFICATION_FAILURES_TOTAL);
+                    warn!("Reconstructed signature failed verification so run fallback");
+                    let share_pubkeys = match fetch_share_pubkeys(&database, validator_pk).await {
+                        Ok(pubkeys) => pubkeys,
+                        Err(err) => {
+                            error!(?err, "Failed to look up share pubkeys");
+                            return;
+                        }
+                    };


Bug: fallback can leave valid shares stranded when count remains >= threshold

After find_invalid_shares removes bad shares, if signature_share.len() >= threshold still holds (e.g. threshold=3, had 5 shares, 1 bad removed → 4 remain), the code falls through and blocks on rx.recv().await without re-attempting combination. If no further messages arrive, the collector is stuck forever holding enough valid shares to succeed.

This can happen in practice when more than threshold partial signatures arrive before the first combine attempt, and only a minority are invalid.

Fix: wrap the threshold check in a loop so it re-attempts after eviction:

Suggested change

if let Some(threshold) = threshold

&& signature_share.len() as u64 >= threshold

{

let signature = match combine_signatures(mem::take(&mut signature_share)) {

Ok(signature) => Arc::new(signature),

Err(err) => {

let Some(validator_pk) = &validator_pubkey else {

error!("No validator pubkey available for verification");

return;

};

match try_combine_and_verify(&signature_share, validator_pk, signing_root) {

CombineOutcome::Success(signature) => {

trace!(?signature, "Successfully recovered signature");

for notifier in mem::take(&mut notifiers) {

if notifier.send(Arc::clone(&signature)).is_err() {

warn!("Callback dropped since signature is no longer relevant");

}

}

full_signature = Some(signature);

}

CombineOutcome::CombineFailed(err) => {

error!(?err, "Failed to recover signature");

return;

}

};

trace!(?signature, "Successfully recovered signature");

for notifier in mem::take(&mut notifiers) {

if notifier.send(Arc::clone(&signature)).is_err() {

warn!("Callback dropped - signature is no longer relevant");

CombineOutcome::VerificationFailed => {

metrics::inc_counter(&metrics::SIGNATURE_VERIFICATION_FAILURES_TOTAL);

warn!("Reconstructed signature failed verification so run fallback");

let share_pubkeys = match fetch_share_pubkeys(&database, validator_pk).await {

Ok(pubkeys) => pubkeys,

Err(err) => {

error!(?err, "Failed to look up share pubkeys");

return;

}

};

// Re-check threshold after potential eviction of invalid shares.

while let Some(threshold) = threshold {

if (signature_share.len() as u64) < threshold {

break;

}

let Some(validator_pk) = &validator_pubkey else {

error!("No validator pubkey available for verification");

return;

};

match try_combine_and_verify(&signature_share, validator_pk, signing_root) {

CombineOutcome::Success(signature) => {

trace!(?signature, "Successfully recovered signature");

for notifier in mem::take(&mut notifiers) {

if notifier.send(Arc::clone(&signature)).is_err() {

warn!("Callback dropped since signature is no longer relevant");

}

}

full_signature = Some(signature);

break;

}

CombineOutcome::CombineFailed(err) => {

error!(?err, "Failed to recover signature");

return;

}

CombineOutcome::VerificationFailed => {

metrics::inc_counter(&metrics::SIGNATURE_VERIFICATION_FAILURES_TOTAL);

warn!("Reconstructed signature failed verification so run fallback");

let share_pubkeys = match fetch_share_pubkeys(&database, validator_pk).await {

Ok(pubkeys) => pubkeys,

Err(err) => {

error!(?err, "Failed to look up share pubkeys");

return;

}

};

let invalid_operators =

find_invalid_shares(&signature_share, signing_root, &share_pubkeys);

if invalid_operators.is_empty() {

let operators: Vec<_> = signature_share.keys().copied().collect();

error!(

?signing_root,

?operators,

"Verification failed but no individual share was invalid"

);

return;

}

warn!(?invalid_operators, "Removing invalid shares");

for op in &invalid_operators {

signature_share.remove(op);

}

// Loop back to re-check threshold with remaining shares

}

}

}

Valid point, when network partials arrive before RegisterNotifier sets the threshold, they accumulate unchecked. The first combine then runs with len >> threshold, and after eviction len can still be >= threshold with all valid shares remaining. The if let didn't re-check after eviction, so it fell through to rx.recv() waiting for a message that might never come.

Changed if let to while let so the threshold check re-runs immediately after eviction. Each iteration either succeeds, drops below threshold, or returns on fatal error -> no unnecessary wait.

shane-moore · 2026-03-17T00:57:50Z

Could we please use the current PR template for the PRs?

It seems we also need to get #870 merged into stable

true true! I can make a pr for this tomorrow

shane-moore added 13 commits March 16, 2026 16:44

chore: get pubkeys for validator from db

51ba6d0

chore: add db to signature collector

036b38c

chore: add sig reconstruction helpers

773028d

feat: add duplicat signature resolution fallback to signature_collector

4bd610d

chore: add metrics for sig verification failures during signature col…

b170c61

…lection

chore: add metrics and unit tests

9755e70

feat: add integration tests

b36ff77

lint: fix lint

ff41acd

chore: remove malicious operator eviction

1d0fec8

lint: fix lint

42f2a0a

refactor: fetch share pubkeys by validator index

41cb3b1

chor: fetch share pubkeys by validator index

9d86598

lint: fix lint

8be0423

shane-moore changed the title ~~Feat/partial sig with fallback~~ feat: partial sig with fallback Mar 17, 2026

claude-code-actions-sigp bot reviewed Mar 17, 2026

View reviewed changes

shane-moore marked this pull request as draft March 17, 2026 00:44

shane-moore marked this pull request as ready for review March 17, 2026 00:44

claude-code-actions-sigp bot reviewed Mar 17, 2026

View reviewed changes

chore: add while to retry try_combine_and_verify

fa0383f

shane-moore added the claude-recheck triggers claude review workflow to re-run label Mar 17, 2026

github-actions bot removed the claude-recheck triggers claude review workflow to re-run label Mar 17, 2026

shane-moore requested review from diegomrsantos, dknopik and petarjuki7 March 17, 2026 05:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: partial sig with fallback#890

feat: partial sig with fallback#890
shane-moore wants to merge 14 commits intosigp:unstablefrom
shane-moore:feat/partial-sig-with-fallback

shane-moore commented Mar 17, 2026 •

edited

Loading

Uh oh!

claude-code-actions-sigp bot Mar 17, 2026

Uh oh!

shane-moore Mar 17, 2026 •

edited

Loading

Uh oh!

diegomrsantos commented Mar 17, 2026

Uh oh!

diegomrsantos commented Mar 17, 2026

Uh oh!

claude-code-actions-sigp bot Mar 17, 2026

Uh oh!

shane-moore Mar 17, 2026

Uh oh!

shane-moore commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shane-moore commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem, Evidence, and Context

Change Overview

Risks, Trade-offs, and Mitigations

Validation

Rollback

Additional Info / Next Steps

Uh oh!

claude-code-actions-sigp bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

shane-moore Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

diegomrsantos commented Mar 17, 2026

Uh oh!

diegomrsantos commented Mar 17, 2026

Uh oh!

claude-code-actions-sigp bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

shane-moore Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

shane-moore commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shane-moore commented Mar 17, 2026 •

edited

Loading

shane-moore Mar 17, 2026 •

edited

Loading