rpcclient: fix several synchronization bugs by starius · Pull Request #2500 · btcsuite/btcd

starius · 2026-03-20T23:40:24Z

Change Description

Incorporated #2451 :

Modify the rpcclient http POST call to ensure that a shutdown immediately interrupts in-flight requests, which otherwise would have to wait until timeout.

Fix 3 other problems in rpcclient:

HTTP POST shutdown can deadlock WaitForShutdown due to double-response race
Batch-mode Send() error path leaves queued per-request futures unresolved
NewBatch starts HTTP POST handlers twice, increasing concurrency surface

Steps to Test

go test ./rpcclient -count=1

Each test is a regression test. It fails if the patch is reverted.

Pull Request Checklist

Testing

Your PR passes all CI checks.
Tests covering the positive and negative (error paths) are included.
Bug fixes contain tests triggering the bug to prevent regressions.

Code Style and Documentation

The change is not insubstantial. Typo fixes are not accepted to fight bot spam.
The change obeys the Code Documentation and Commenting guidelines, and lines wrap at 80.
Commits follow the Ideal Git Commit Structure.
Any new logging statements use an appropriate subsystem and logging level.

📝 Please see our Contribution Guidelines for further guidance.

saubyk · 2026-03-23T15:13:16Z

hello @wydengyre @seeforschauer @jcvernaleo would you consider reviewing this one?

seeforschauer

LGTM — all three bugs are real and the fixes are correct. I've been working in rpcclient/infrastructure.go recently (#2506, #2505), so I have direct context on these paths.

Commit structure is clean (one bug, one test, one commit). Each test fails when the fix is reverted — solid regression coverage.

One suggestion on sendPostRequest (inline) for tighter shutdown determinism using the priority-select pattern already in addRequest. Two minor notes on failBatchRequests.

seeforschauer · 2026-03-23T19:53:30Z

rpcclient/infrastructure.go

+	// Atomically either queue the request or fail it due to shutdown.
+	//
+	// This avoids delivering two terminal responses to the same request, which
+	// can otherwise block shutdown cleanup on the second send.
 	select {
 	case <-c.shutdown:
 		jReq.responseChan <- &Response{result: nil, err: ErrClientShutdown}
-	default:
-	}
+		return

-	select {
 	case c.sendPostChan <- jReq:
 		log.Tracef("Sent command [%s] with id %d", jReq.method, jReq.id)
-
-	case <-c.shutdown:
-		return
 	}


suggestion (non-blocking): Consider the priority-select pattern here.

When shutdown is already closed (its permanent state after Shutdown()), the single select randomly picks between responding and enqueueing (~50/50 per Go spec). If sendPostHandler has already exited cleanup, the enqueued request's future is never resolved.

A non-blocking shutdown guard first makes the common post-shutdown path deterministic — addRequest (line 213) already uses this exact pattern for the same reason. The remaining race (shutdown closing between the two selects) has a much narrower window.

Suggested change

// Atomically either queue the request or fail it due to shutdown.

//

// This avoids delivering two terminal responses to the same request, which

// can otherwise block shutdown cleanup on the second send.

select {

case <-c.shutdown:

jReq.responseChan <- &Response{result: nil, err: ErrClientShutdown}

default:

}

return

select {

case c.sendPostChan <- jReq:

log.Tracef("Sent command [%s] with id %d", jReq.method, jReq.id)

case <-c.shutdown:

return

}

// Prefer shutdown: if already closed, fail the request immediately.

// This avoids a random race between shutdown and enqueue when both

// channels are ready, consistent with the guard in addRequest.

select {

case <-c.shutdown:

jReq.responseChan <- &Response{result: nil, err: ErrClientShutdown}

return

default:

}

// Normal path: enqueue or fail on shutdown. Exactly one outcome.

select {

case c.sendPostChan <- jReq:

log.Tracef("Sent command [%s] with id %d", jReq.method, jReq.id)

case <-c.shutdown:

jReq.responseChan <- &Response{result: nil, err: ErrClientShutdown}

}

This is effectively how the code already was.

@starius isn't it better to prioritize the shutdown path?

Right, the structure is two selects like the original — the critical difference is the return after the first shutdown case. The original fell through into the second select even after sending ErrClientShutdown, which allowed the double-resolve.

With the return, it becomes the standard priority-select: if shutdown is already closed, respond deterministically and exit. The second select is only reached when shutdown wasn't closed at check time.

Fixed! Now it prioritizes the shutdown path:

// sendPostRequest sends the passed HTTP request to the RPC server using the // HTTP client associated with the client. It is backed by a buffered channel, // so it will not block until the send channel is full. func (c *Client) sendPostRequest(jReq *jsonRequest) { // Prefer shutdown when it is already closed so this path is // deterministic. This mirrors addRequest and avoids post-shutdown // enqueueing. select { case <-c.shutdown: jReq.responseChan <- &Response{ result: nil, err: ErrClientShutdown, } return default: } // Normal path: either enqueue, or fail if shutdown closes in the race // window after the guard above. select { case c.sendPostChan <- jReq: log.Tracef("Sent command [%s] with id %d", jReq.method, jReq.id) case <-c.shutdown: jReq.responseChan <- &Response{ result: nil, err: ErrClientShutdown, } } }

seeforschauer · 2026-03-23T19:53:30Z

rpcclient/infrastructure.go

+
+		// Resolve all pending futures on the first batch-level failure so
+		// callers waiting on Receive don't block indefinitely.
+		req.responseChan <- &Response{err: err}


nit: This send is safe because in batch mode sendRequest only calls addRequest (never sendPostRequest), so individual responseChan buffers (size 1) are guaranteed unwritten at this point. A brief comment documenting this invariant would help future readers — e.g.:

// Safe: batch-mode responseChan buffers are unwritten here, // so this send won't block while locks are held. req.responseChan <- &Response{err: err}

Added a comment:

// Resolve all pending futures on the first batch-level failure // so callers waiting on Receive don't block indefinitely. // Safe: batch-mode responseChan buffers are unwritten here, // so this send won't block while locks are held. Batch-mode // requests only use addRequest (not sendPostRequest), so each // responseChan buffer is still empty. req.responseChan <- &Response{err: err}

seeforschauer · 2026-03-23T19:53:30Z

rpcclient/infrastructure.go

+	}
+
+	c.requestMap = make(map[uint64]*list.Element)
+	c.requestList.Init()


question: In batch mode, addRequest pushes to batchList, never requestList, so this should already be empty. Intentional defensive reset, or leftover? If intentional, a quick comment would clarify.

It is a defensive reset. Added a comment:

// Batch-mode requests are tracked in batchList, so requestList should // already be empty. Keep this defensive reset for invariants and future // call paths. c.requestList.Init()

Roasbeef

Change looks good, only comment is why we'd move away from the pattern that prioritizes a cancel path.

Roasbeef · 2026-03-24T00:00:53Z

rpcclient/infrastructure.go

+	// Atomically either queue the request or fail it due to shutdown.
+	//
+	// This avoids delivering two terminal responses to the same request, which
+	// can otherwise block shutdown cleanup on the second send.
 	select {
 	case <-c.shutdown:
 		jReq.responseChan <- &Response{result: nil, err: ErrClientShutdown}
-	default:
-	}
+		return

-	select {
 	case c.sendPostChan <- jReq:
 		log.Tracef("Sent command [%s] with id %d", jReq.method, jReq.id)
-
-	case <-c.shutdown:
-		return
 	}


This is effectively how the code already was.

@starius isn't it better to prioritize the shutdown path?

Use a shutdown-aware context for HTTP POST handling so shutdown can interrupt in-flight requests. Centralize shutdown error remapping in handleSendPostMessage so all error exits consistently return ErrClientShutdown when shutdown causes a context cancellation. Move the retrying HTTP calling code to a free function handleSendPostMessageWithRetry and cover it with tests.

When shutdown races with sendPostRequest, a request could be marked as ErrClientShutdown and still be enqueued. The sendPostHandler cleanup loop would then try to send a second terminal response and could block forever on a full response channel. Fix this by prioritizing the shutdown path. First check shutdown with a non-blocking select and return immediately when it is already closed. Then use a second select to choose between enqueue and shutdown for the remaining race window. A regression test verifies a shutdown request is failed immediately and never enqueued.

Batch requests were only clearing batchList on Send() errors. The per-request futures remained unresolved, so callers waiting on Receive could block forever after a failed batch round trip. Add failBatchRequests to fan out the Send() error to every queued batch request and clear tracking state in one place. A regression test now verifies queued futures complete with the same error returned by Send().

NewBatch called New() and then called start() again. In HTTP POST mode that created a second sendPostHandler and another shutdown-cancel goroutine, which broke the expected single-flight serialization of POST sends. Keep NewBatch as a semantic toggle only: rely on New() to start handlers once, then set batch=true. A regression test now checks that batch POST requests stay serialized through one active transport call.

starius · 2026-03-25T04:52:11Z

I addressed the remaining comments in #2451 and added the updated version of it here as commit "rpcclient: support canceling in-flight http requests". Functionally it is the same, but refactored. I simplified the code by making a function that returns ([]byte, error); error remapping and channel sending done is done on the call site.

CC @Roasbeef @seeforschauer @wydengyre

saubyk assigned starius Mar 23, 2026

seeforschauer approved these changes Mar 23, 2026

View reviewed changes

saubyk added this to the v0.25.1 milestone Mar 23, 2026

Roasbeef requested changes Mar 24, 2026

View reviewed changes

starius force-pushed the rpcclient-fixes branch from 264b120 to 5da6fa2 Compare March 24, 2026 17:53

starius requested review from Roasbeef and seeforschauer March 24, 2026 17:57

starius mentioned this pull request Mar 24, 2026

rpcclient: make shutdown interrupt in-flight POSTs #2451

Closed

7 tasks

starius force-pushed the rpcclient-fixes branch from 5da6fa2 to 1acd9a8 Compare March 25, 2026 03:38

wydengyre and others added 4 commits March 24, 2026 22:48

starius force-pushed the rpcclient-fixes branch from 1acd9a8 to 54be134 Compare March 25, 2026 03:48

Conversation

starius commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Description

Steps to Test

Pull Request Checklist

Testing

Code Style and Documentation

Uh oh!

saubyk commented Mar 23, 2026

Uh oh!

seeforschauer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Roasbeef left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

starius commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

starius commented Mar 20, 2026 •

edited

Loading