Skip to content

fix: add 30-second timeout to SPV quorum public key lookup#621

Open
thepastaclaw wants to merge 2 commits intodashpay:v1.0-devfrom
thepastaclaw:fix/spv-quorum-timeout
Open

fix: add 30-second timeout to SPV quorum public key lookup#621
thepastaclaw wants to merge 2 commits intodashpay:v1.0-devfrom
thepastaclaw:fix/spv-quorum-timeout

Conversation

@thepastaclaw
Copy link
Collaborator

@thepastaclaw thepastaclaw commented Feb 22, 2026

Wrap the SPV quorum public key lookup in a 30-second timeout to prevent indefinite hangs.

Problem

get_quorum_public_key calls get_quorum_by_height on the SPV client interface with no timeout. If the SPV client is unresponsive or the network request stalls, this blocks the calling thread indefinitely (it runs inside block_in_place/block_on).

Fix

Wrap the async lookup in tokio::time::timeout(Duration::from_secs(30), ...). On timeout, a descriptive warning is logged and a clear error is returned with the quorum type, hash, and height for debugging.

30 seconds is generous for a quorum lookup — normal responses arrive in under a second.

Cherry-picked from ralph/improvements (commit 9c17988).

Validation

What was tested:

  • cargo clippy --all-features --all-targets -- -D warnings — lint check
  • cargo test --all-features --workspace — full workspace test suite

Results:

  • All local commands passed
  • Clippy CI check — pass (5m49s)
  • Test Suite CI check — pass (7m52s)

Environment: Local macOS arm64; GitHub Actions CI (ubuntu-latest)

Summary by CodeRabbit

  • Bug Fixes
    • Added timeout handling for quorum lookup operations to prevent indefinite hangs. Operations exceeding 30 seconds now return a clear error message instead of waiting indefinitely, improving system reliability and responsiveness.

Wrap the quorum public key lookup in a 30-second timeout to prevent
indefinite hangs when the SPV client is unresponsive or the network
request stalls. Returns a clear timeout error message with context.

Cherry-picked from ralph/improvements (commit 9c17988).
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b71f02 and f5993b8.

📒 Files selected for processing (1)
  • src/spv/manager.rs

📝 Walkthrough

Walkthrough

The change adds a 30-second timeout mechanism to the SPV quorum lookup call in get_quorum_public_key. If the quorum retrieval exceeds this duration, a timeout-specific error is returned; otherwise, existing success and error handling paths remain unaffected.

Changes

Cohort / File(s) Summary
SPV Quorum Lookup Timeout
src/spv/manager.rs
Added 30-second timeout wrapper around quorum lookup call in get_quorum_public_key, returning timeout-specific error on expiry while preserving existing success and error paths.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A timeout's hop, a quorum's skip,
Thirty seconds—don't let queries slip!
When the SPV lookup starts to lag,
A friendly error won't let hopes sag. ⏱️✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a 30-second timeout to the SPV quorum public key lookup, which is the primary modification in src/spv/manager.rs.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@thepastaclaw
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 22, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine

@thepastaclaw
Copy link
Collaborator Author

Addressed Lukasz's feedback by removing the unnecessary format! usage in the warning path (using tracing::warn! formatting directly) in commit f5993b8. I also resolved the review thread now that the change is in place.

Validation re-run on current branch:

  • cargo check -q

Copy link

@xdustinface xdustinface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been an issue? I wonder why this would be needed. There is no network request involved here its just a lookup in the masternode list.

@xdustinface
Copy link

Also btw, you don't seem to be on a recent commit of rust-dashcore? The client interface doesn't exist anymore. The DashSpvClient is now clonable so you can probably simplify quite few things in DET in this regard.

@thepastaclaw
Copy link
Collaborator Author

Good point — the current v1.0-dev branch pins rust-dashcore at a05d256f which is pretty old. Upgrading to pick up the clonable DashSpvClient and the removed client interface would be a nice follow-up, but it's a bigger change than this timeout fix and would need its own PR.

I'll open a tracker issue to bump the rust-dashcore dependency separately. Thanks for the heads up!

@thepastaclaw
Copy link
Collaborator Author

You're right that the underlying operation is a local masternode list lookup — no network round-trip. But the call goes through DashSpvClientInterface's command channel (command_tx → SPV client task → response), not a direct function call. If the SPV client task is busy, deadlocked, or its event loop is blocked, the channel wait hangs indefinitely inside block_in_place/block_on, which freezes the calling thread with no feedback.

The timeout is defensive against the command channel hanging, not against network latency. 30 seconds is generous enough to never trigger during normal operation, but prevents a silent hang from becoming a frozen UI.

That said, if you think the command channel pattern itself is robust enough that this can't happen in practice, happy to hear that — you know the SPV internals better than I do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants