Skip to content

rpcclient: make shutdown interrupt in-flight POSTs#2451

Closed
wydengyre wants to merge 4 commits intobtcsuite:masterfrom
FiveBellsSettlement:btc-client-respect-context
Closed

rpcclient: make shutdown interrupt in-flight POSTs#2451
wydengyre wants to merge 4 commits intobtcsuite:masterfrom
FiveBellsSettlement:btc-client-respect-context

Conversation

@wydengyre
Copy link
Copy Markdown
Contributor

@wydengyre wydengyre commented Oct 30, 2025

Change Description

Modify the rpcclient http POST call to ensure that a shutdown immediately interrupts in-flight requests, which otherwise would have to wait until timeout.

This PR includes the change in #2450 as it is necessary to ensure interruption during Dial.

Steps to Test

The test suite has an added test to ensure this works.

Pull Request Checklist

Testing

  • Your PR passes all CI checks.
  • Tests covering the positive and negative (error paths) are included.
  • Bug fixes contain tests triggering the bug to prevent regressions.

Code Style and Documentation

📝 Please see our Contribution Guidelines for further guidance.

@coveralls
Copy link
Copy Markdown

coveralls commented Oct 30, 2025

Pull Request Test Coverage Report for Build 21215011456

Details

  • 21 of 24 (87.5%) changed or added relevant lines in 1 file are covered.
  • 85 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.02%) to 54.972%

Changes Missing Coverage Covered Lines Changed/Added Lines %
rpcclient/infrastructure.go 21 24 87.5%
Files with Coverage Reduction New Missed Lines %
btcutil/gcs/gcs.go 1 80.95%
rpcclient/infrastructure.go 84 48.49%
Totals Coverage Status
Change from base Build 20942501138: 0.02%
Covered Lines: 31217
Relevant Lines: 56787

💛 - Coveralls

Copy link
Copy Markdown
Member

@jcvernaleo jcvernaleo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

OK

httpReq, err = http.NewRequestWithContext(ctx, "POST", httpURL, bodyReader)
if err != nil {
// We must observe the contract that shutdown returns ErrClientShutdown.
if errors.Is(err, context.Canceled) && errors.Is(context.Cause(ctx), ErrClientShutdown) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is unneeded since http.NewRequestWithContext will never return a error that's of type context.Canceled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in f87d914

@kcalvinalvin
Copy link
Copy Markdown
Collaborator

Checked:

1: The added TestHTTPPostShutdownInterruptsPendingRequest test is correct.
2: Sanity checked the changes in infrastructure.go.

One nitpicky gripe is maybe the test could be commented better.

Looks good overall except for that one unneeded check.

Drop unreachable context-canceled mapping after request creation.

Clarify the HTTP POST shutdown test flow with brief comments.
@wydengyre
Copy link
Copy Markdown
Contributor Author

Checked:

1: The added TestHTTPPostShutdownInterruptsPendingRequest test is correct. 2: Sanity checked the changes in infrastructure.go.

One nitpicky gripe is maybe the test could be commented better.

Looks good overall except for that one unneeded check.

@kcalvinalvin thanks for your review.

I've addressed your comments in f87d914

Can you re-review?

@seeforschauer
Copy link
Copy Markdown

Hey @wydengyre, nice work on this — we hit the same pain point in production.

We run a HAProxy health checker that polls BTC nodes via GetBlockCount(). Each probe spawns a goroutine; when a node hangs, GetBlockCount blocks for ~600s (retry loop in handleSendPostMessage — 10 attempts x 60s HTTP timeout) while new probes keep arriving. We observed 117 leaked goroutines on staging.

We searched the issue tracker and PRs for existing context support work — found #2450 (dial timeout, merged), this PR (shutdown interrupt), and #1323 (connection reuse), but nothing addressing per-request context.Context for RPC methods. The gap is clear: callers currently have no way to cancel or timeout an individual GetBlockCount/SendRawTransaction/etc. call.

Your PR fixes shutdown propagation, which is great. The transport plumbing is almost there with http.NewRequestWithContext. Would you or maintainers be open to extending this to per-request context support? Something like:

func (c *Client) GetBlockCountWithContext(ctx context.Context) (int64, error)

Minimal changes: ctx field on jsonRequest, ctx.Done() check in the retry loop, SendCmdCtx variant. Non-breaking — existing methods keep working. ~80-100 lines.

Happy to open an issue + PR for this if there's interest. We worked around it with a singleflight pattern at our layer, but the proper fix belongs here.

@wydengyre
Copy link
Copy Markdown
Contributor Author

Would you or maintainers be open to extending this to per-request context support?

@seeforschauer I had the same problem as you, and actually have a private library that fixes it in a backward-compatible way. But I'm not affiliated with this project in any official way, so sadly the decision to support such a feature is outside my influence. I would certainly vote in favor of it, though, as it's both pragmatic/necessary and more "Go-like".

@seeforschauer
Copy link
Copy Markdown

Thanks for the support, and good to know you've solved it independently — validates the approach.

I went ahead and opened #2499 to formalize the request for per-request context.Context support. If you'd be open to sharing your backward-compatible approach (or collaborating on a PR), happy to coordinate there.

In the meantime, pinging @kcalvinalvin — any thoughts on merging this PR as-is and tracking context support separately in #2499?

@saubyk saubyk added this to the v0.25.1 milestone Mar 20, 2026
Copy link
Copy Markdown
Contributor

@starius starius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good! I found a couple of remaining cases of returning non-remapped errors and also gaps in test coverage.

Can you consolidate errors remapping in one place, so there are no leaks of non-remapped error from handleSendPostMessage (in case we add more return operators in the middle of it later).

Could you also rebase the PR so it is easier to review, please?

httpResponse, err = c.httpClient.Do(httpReq)

// Quit the retry loop on success or if we can't retry anymore.
if err == nil || i == tries-1 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If errors.Is(err, context.Canceled) && errors.Is(context.Cause(ctx), ErrClientShutdown) && i == tries-1, the added error remapping logic is skipped. The code returns context.Canceled in this case, but it is expected to return ErrClientShutdown instead.

}

// Read the raw bytes and close the response.
respBytes, err := io.ReadAll(httpResponse.Body)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ctx is cancelled in the middle of io.ReadAll running, we return fmt.Errorf("error reading json reply: ..., but I think we should also return ErrClientShutdown in this case, right?

result, err := future.Receive()
require.Zero(t, result)
require.ErrorContains(t, err, ErrClientShutdown.Error())
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to add more test coverage of the edge cases:

  • Cancellation while waiting in retry backoff path (ctx.Done() select branch).
  • Cancellation on final retry attempt (i == tries-1).
  • Cancellation during response body read after a successful Do.

@starius
Copy link
Copy Markdown
Contributor

starius commented Mar 24, 2026

I'm going to copy the commits from this PR on top of #2500 and resolve the remaining comments.

@wydengyre If you already addressed some of them, please push, so I have the latest version of your commits.

@wydengyre
Copy link
Copy Markdown
Contributor Author

@wydengyre If you already addressed some of them, please push, so I have the latest version of your commits.

I have not, sorry. Please go ahead.

@starius
Copy link
Copy Markdown
Contributor

starius commented Mar 25, 2026

This PR was integrated into #2500

@Roasbeef Roasbeef closed this Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants