fix: add node health check to incr/decr before mutate operation#186
fix: add node health check to incr/decr before mutate operation#186AustinWheel wants to merge 2 commits intomasterfrom
Conversation
incr/decr operations bypassed the ensureWriteQueueSize check that other write paths (set, add, touch) use, causing them to block on unreachable or stalled nodes for the full mutateOperationTimeout duration. A single unhealthy node could cause all incr/decr calls to block, exhausting the caller's thread pool. This adds the same ensureWriteQueueSize gate to incr/decr. If the target node is inactive or its write queue is full, the operation returns -1 immediately, consistent with the documented API contract. The existing reconciliation logic in EVCacheImpl.incr()/decr() already handles -1 as a zone failure and will sync the value from healthy zones on the next successful operation.
|
|
||
| public long incr(String key, long by, long defaultVal, int timeToLive) throws EVCacheException { | ||
| final MemcachedNode node = evcacheMemcachedClient.getEVCacheNode(key); | ||
| if (!ensureWriteQueueSize(node, key, Call.INCR)) { |
There was a problem hiding this comment.
+1 to adding the check here. On the other hand, I am thinking we probably should do the evcNode.isAvailable(call) check first like in validateNodeForRead to fast fail instead of going through the retry wait loop if the node is already inactive. I couldn't think of a reason why we did this only for read APIs, but the same should also apply to write APIs IIUC. Any thought?
There was a problem hiding this comment.
I think the goal is to not lose writes, the write queue is considerably larger than reads, and if a node went out of discovery for some small time and it's write queue isn't full, we could still add operations which it could process after returning
There was a problem hiding this comment.
That's a good point. However, I wonder is that better than just fast fail those writes. EVCache writes go to multiple zones/replicas. A failed write to one node doesn't mean data loss. The failed result will be reflected in the latch count and the client's LatchPolicy will check if they can tolerant one or some failure.
And for a high throughput workload, there can be lots of request threads retrying/waiting (threads are blocked on sleep) before the connection can be rebuilt (there is also a backoff time between each reconnect, which makes this wait even longer I think).
There was a problem hiding this comment.
okay yeah I think it could be useful for incr/decr operations, but should such a change be added to all write ops?
… write path - EVCacheImpl.decr: fix reconciliation to pick the minimum non-(-1) value across nodes instead of the maximum. For decr, the most up-to-date node has the lowest value (most decremented), so the old max-pick logic would overwrite correctly decremented nodes with stale higher values. - EVCacheClient.ensureWriteQueueSize: add isAvailable() check before entering the retry/sleep loop. This fast-fails writes to inactive nodes instead of blocking request threads through 3 retry iterations. Affects all write operations that go through ensureWriteQueueSize (incr, decr, set, delete, etc.).
incr/decr operations bypassed the ensureWriteQueueSize check that other
write paths (set, add, touch) use, causing them to block on unreachable
or stalled nodes for the full mutateOperationTimeout duration. A single
unhealthy node could cause all incr/decr calls to block, exhausting the
caller's thread pool.
This adds the same ensureWriteQueueSize gate to incr/decr. If the target
node is inactive or its write queue is full, the operation returns -1
immediately, consistent with the documented API contract. The existing
reconciliation logic in EVCacheImpl.incr()/decr() already handles -1 as
a zone failure and will sync the value from healthy zones on the next
successful operation.