rpcclient: fix several synchronization bugs#2500
rpcclient: fix several synchronization bugs#2500starius wants to merge 4 commits intobtcsuite:masterfrom
Conversation
|
hello @wydengyre @seeforschauer @jcvernaleo would you consider reviewing this one? |
seeforschauer
left a comment
There was a problem hiding this comment.
LGTM — all three bugs are real and the fixes are correct. I've been working in rpcclient/infrastructure.go recently (#2506, #2505), so I have direct context on these paths.
Commit structure is clean (one bug, one test, one commit). Each test fails when the fix is reverted — solid regression coverage.
One suggestion on sendPostRequest (inline) for tighter shutdown determinism using the priority-select pattern already in addRequest. Two minor notes on failBatchRequests.
rpcclient/infrastructure.go
Outdated
| // Atomically either queue the request or fail it due to shutdown. | ||
| // | ||
| // This avoids delivering two terminal responses to the same request, which | ||
| // can otherwise block shutdown cleanup on the second send. | ||
| select { | ||
| case <-c.shutdown: | ||
| jReq.responseChan <- &Response{result: nil, err: ErrClientShutdown} | ||
| default: | ||
| } | ||
| return | ||
|
|
||
| select { | ||
| case c.sendPostChan <- jReq: | ||
| log.Tracef("Sent command [%s] with id %d", jReq.method, jReq.id) | ||
|
|
||
| case <-c.shutdown: | ||
| return | ||
| } |
There was a problem hiding this comment.
suggestion (non-blocking): Consider the priority-select pattern here.
When shutdown is already closed (its permanent state after Shutdown()), the single select randomly picks between responding and enqueueing (~50/50 per Go spec). If sendPostHandler has already exited cleanup, the enqueued request's future is never resolved.
A non-blocking shutdown guard first makes the common post-shutdown path deterministic — addRequest (line 213) already uses this exact pattern for the same reason. The remaining race (shutdown closing between the two selects) has a much narrower window.
| // Atomically either queue the request or fail it due to shutdown. | |
| // | |
| // This avoids delivering two terminal responses to the same request, which | |
| // can otherwise block shutdown cleanup on the second send. | |
| select { | |
| case <-c.shutdown: | |
| jReq.responseChan <- &Response{result: nil, err: ErrClientShutdown} | |
| default: | |
| } | |
| return | |
| select { | |
| case c.sendPostChan <- jReq: | |
| log.Tracef("Sent command [%s] with id %d", jReq.method, jReq.id) | |
| case <-c.shutdown: | |
| return | |
| } | |
| // Prefer shutdown: if already closed, fail the request immediately. | |
| // This avoids a random race between shutdown and enqueue when both | |
| // channels are ready, consistent with the guard in addRequest. | |
| select { | |
| case <-c.shutdown: | |
| jReq.responseChan <- &Response{result: nil, err: ErrClientShutdown} | |
| return | |
| default: | |
| } | |
| // Normal path: enqueue or fail on shutdown. Exactly one outcome. | |
| select { | |
| case c.sendPostChan <- jReq: | |
| log.Tracef("Sent command [%s] with id %d", jReq.method, jReq.id) | |
| case <-c.shutdown: | |
| jReq.responseChan <- &Response{result: nil, err: ErrClientShutdown} | |
| } |
There was a problem hiding this comment.
This is effectively how the code already was.
@starius isn't it better to prioritize the shutdown path?
There was a problem hiding this comment.
Right, the structure is two selects like the original — the critical difference is the return after the first shutdown case. The original fell through into the second select even after sending ErrClientShutdown, which allowed the double-resolve.
With the return, it becomes the standard priority-select: if shutdown is already closed, respond deterministically and exit. The second select is only reached when shutdown wasn't closed at check time.
There was a problem hiding this comment.
Fixed! Now it prioritizes the shutdown path:
// sendPostRequest sends the passed HTTP request to the RPC server using the
// HTTP client associated with the client. It is backed by a buffered channel,
// so it will not block until the send channel is full.
func (c *Client) sendPostRequest(jReq *jsonRequest) {
// Prefer shutdown when it is already closed so this path is
// deterministic. This mirrors addRequest and avoids post-shutdown
// enqueueing.
select {
case <-c.shutdown:
jReq.responseChan <- &Response{
result: nil,
err: ErrClientShutdown,
}
return
default:
}
// Normal path: either enqueue, or fail if shutdown closes in the race
// window after the guard above.
select {
case c.sendPostChan <- jReq:
log.Tracef("Sent command [%s] with id %d", jReq.method, jReq.id)
case <-c.shutdown:
jReq.responseChan <- &Response{
result: nil,
err: ErrClientShutdown,
}
}
}|
|
||
| // Resolve all pending futures on the first batch-level failure so | ||
| // callers waiting on Receive don't block indefinitely. | ||
| req.responseChan <- &Response{err: err} |
There was a problem hiding this comment.
nit: This send is safe because in batch mode sendRequest only calls addRequest (never sendPostRequest), so individual responseChan buffers (size 1) are guaranteed unwritten at this point. A brief comment documenting this invariant would help future readers — e.g.:
// Safe: batch-mode responseChan buffers are unwritten here,
// so this send won't block while locks are held.
req.responseChan <- &Response{err: err}There was a problem hiding this comment.
Added a comment:
// Resolve all pending futures on the first batch-level failure
// so callers waiting on Receive don't block indefinitely.
// Safe: batch-mode responseChan buffers are unwritten here,
// so this send won't block while locks are held. Batch-mode
// requests only use addRequest (not sendPostRequest), so each
// responseChan buffer is still empty.
req.responseChan <- &Response{err: err}
rpcclient/infrastructure.go
Outdated
| } | ||
|
|
||
| c.requestMap = make(map[uint64]*list.Element) | ||
| c.requestList.Init() |
There was a problem hiding this comment.
question: In batch mode, addRequest pushes to batchList, never requestList, so this should already be empty. Intentional defensive reset, or leftover? If intentional, a quick comment would clarify.
There was a problem hiding this comment.
It is a defensive reset. Added a comment:
// Batch-mode requests are tracked in batchList, so requestList should
// already be empty. Keep this defensive reset for invariants and future
// call paths.
c.requestList.Init()
Roasbeef
left a comment
There was a problem hiding this comment.
Change looks good, only comment is why we'd move away from the pattern that prioritizes a cancel path.
rpcclient/infrastructure.go
Outdated
| // Atomically either queue the request or fail it due to shutdown. | ||
| // | ||
| // This avoids delivering two terminal responses to the same request, which | ||
| // can otherwise block shutdown cleanup on the second send. | ||
| select { | ||
| case <-c.shutdown: | ||
| jReq.responseChan <- &Response{result: nil, err: ErrClientShutdown} | ||
| default: | ||
| } | ||
| return | ||
|
|
||
| select { | ||
| case c.sendPostChan <- jReq: | ||
| log.Tracef("Sent command [%s] with id %d", jReq.method, jReq.id) | ||
|
|
||
| case <-c.shutdown: | ||
| return | ||
| } |
There was a problem hiding this comment.
This is effectively how the code already was.
@starius isn't it better to prioritize the shutdown path?
264b120 to
5da6fa2
Compare
5da6fa2 to
1acd9a8
Compare
Use a shutdown-aware context for HTTP POST handling so shutdown can interrupt in-flight requests. Centralize shutdown error remapping in handleSendPostMessage so all error exits consistently return ErrClientShutdown when shutdown causes a context cancellation. Move the retrying HTTP calling code to a free function handleSendPostMessageWithRetry and cover it with tests.
When shutdown races with sendPostRequest, a request could be marked as ErrClientShutdown and still be enqueued. The sendPostHandler cleanup loop would then try to send a second terminal response and could block forever on a full response channel. Fix this by prioritizing the shutdown path. First check shutdown with a non-blocking select and return immediately when it is already closed. Then use a second select to choose between enqueue and shutdown for the remaining race window. A regression test verifies a shutdown request is failed immediately and never enqueued.
Batch requests were only clearing batchList on Send() errors. The per-request futures remained unresolved, so callers waiting on Receive could block forever after a failed batch round trip. Add failBatchRequests to fan out the Send() error to every queued batch request and clear tracking state in one place. A regression test now verifies queued futures complete with the same error returned by Send().
NewBatch called New() and then called start() again. In HTTP POST mode that created a second sendPostHandler and another shutdown-cancel goroutine, which broke the expected single-flight serialization of POST sends. Keep NewBatch as a semantic toggle only: rely on New() to start handlers once, then set batch=true. A regression test now checks that batch POST requests stay serialized through one active transport call.
1acd9a8 to
54be134
Compare
|
I addressed the remaining comments in #2451 and added the updated version of it here as commit "rpcclient: support canceling in-flight http requests". Functionally it is the same, but refactored. I simplified the code by making a function that returns |
Change Description
Incorporated #2451 :
Modify the rpcclient http POST call to ensure that a shutdown immediately interrupts in-flight requests, which otherwise would have to wait until timeout.
Fix 3 other problems in rpcclient:
Steps to Test
Each test is a regression test. It fails if the patch is reverted.
Pull Request Checklist
Testing
Code Style and Documentation
📝 Please see our Contribution Guidelines for further guidance.