DRIVERS-3427 - Finalize client backpressure implementation for phase 1 rollout#1919
DRIVERS-3427 - Finalize client backpressure implementation for phase 1 rollout#1919NoahStapp wants to merge 9 commits intomongodb:masterfrom
Conversation
| The sum of 2 backoffs is 0.3 seconds. There is a 0.3-second window to account for potential variance between the | ||
| two runs. |
There was a problem hiding this comment.
Following the logic used in the previous version of this test, wouldn't this make the comparison value in the assertion 0? (i.e. backoff time minus buffer window)
There was a problem hiding this comment.
Good catch, correct.
There was a problem hiding this comment.
I'm concerned that this test will be flaky because of the small time values. WDYT about restoring the original logic and just setting maxAdaptiveRetries=5? This would also give us better confirmation that the proper values for backoff are being calculated.
There was a problem hiding this comment.
We could add another test to explicitly verify proper backoff calculation that configures maxAdaptiveRetries. The purpose of this test is to verify that the backoff is used, which is independent of the actual time values. We could add a larger buffer window if flakiness is a concern.
There was a problem hiding this comment.
Discussed over slack, we decided to update the prose test to assert on a window of values rather than a lower bound.
Co-authored-by: Isabel Atkinson <isabelatkinson@gmail.com>
| The sum of 2 backoffs is 0.3 seconds. There is a 0.3-second window to account for potential variance between the | ||
| two runs. |
There was a problem hiding this comment.
I'm concerned that this test will be flaky because of the small time values. WDYT about restoring the original logic and just setting maxAdaptiveRetries=5? This would also give us better confirmation that the proper values for backoff are being calculated.
Co-authored-by: Isabel Atkinson <isabelatkinson@gmail.com>
| 4. If the request is eligible for retry (as outlined in step 2 above) and `enableOverloadRetargeting` is enabled, the | ||
| client MUST add the previously used server's address to the list of deprioritized server addresses for | ||
| [server selection](../server-selection/server-selection.md). Drivers MUST expose `enableOverloadRetargeting` as a | ||
| configurable boolean option that defaults to `false`. |
There was a problem hiding this comment.
Given the no knobs mantra, we need a very good reason to add enableOverloadRetargeting.
The specification gives no justification behind the enableOverloadRetargeting knob and its default value false. I think, the justification should be added, if it is eventually agreed that it exists. The only justification I found is in Decision: Driver Backpressure Retry Behavior Defaults:
a)
A primaryPreferred readPreference currently only reads from a secondary node if the primary is unavailable. Making an overloaded primary count as “unavailable” changes this semantically, but is a net benefit for users unless they truly do not want any secondary reads unless the primary is actually down.
b)
The same is true for secondaryPreferred reads and primary routing. As a result, users must explicitly opt-in to this feature.
The justification does not seem correct to me, and I don't think we should add the enableOverloadRetargeting option.
a)
The alleged interpretation of primaryPreferred as a guarantee that no secondary read happens _"unless the primary is actually down" does not seem to make sense.
- Formally, there is no such thing as "actually down". A driver may mark the primary as
Unknownsimply as a result of a temporary network issue, even as a result of a network timeout if it causes authentication to fail (see https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.md#why-mark-a-server-unknown-after-an-auth-error). When this happens, secondary reads may happen, but I am guessing that no one will say that the primary is "actually down".- The documentation of this read concern is: "In most situations, operations read from the primary member of the set. However, if the primary is unavailable, as is the case during failover situations, operations read from secondary members that satisfy...". Note how it neither defines "unavailable", nor limits the situations strictly to failover.
- Even if "actually down" were formally a thing (for example, one may define "actually down" as a situation when the driver's cluster view does not have
RSPrimary), unless there is a way for an application to determine that a secondary read was executed while the primary was not being "actually down", it can't tell whether the alleged semantics ofprimaryPreferredis violated. And there is no such way, not even via the cluster/server listeners, because there is no specified order between the cluster/server events and the execution of operations. Therefore, this alleged semantics is meaningless.
Any application that uses primaryPreferred must be prepared to deal with secondary reads, and us deprioritizing the primary does not violate any formal guarantees of the primaryPreferred read concern.
b)
The b) part is even weaker:
secondaryPreferred, similarly toprimaryPreferred, means that the application is ready to deal with both primary and secondary reads.- On top of that, there can be no consistency guarantees the application expects from secondary reads that could be violated by primary reads.
There was a problem hiding this comment.
The current contract we've made with users utilizing primaryPreferred is that reads will only go to a secondary if the primary is unavailable. The documentation does not explicitly define unavailable, but in practice that means the primary is unselectable. Overload retargeting makes the primary unselectable for a retry operation if it returned an overload error on a previous attempt. This materially changes how often secondary reads occur. Since secondary reads can result in stale data, enabling overload retargeting increases the chance that users of primaryPreferred will get stale data when they did not previously. This is a semantic change, and so we've made the decision to turn it off by default and add a knob to enable it.
Overload retargeting significantly increases availability during overload, but it does increase the risk of getting stale data when used with primaryPreferred. We may find that users of primaryPreferred prefer that behavior and we turn on retargeting by defualt in the future, but that's not in the scope of this initial release.
secondaryPreferred does not have this same staleness issue, but it still materially changes what the preference means from "almost always secondary" to "sometimes primary".
The larger discussion here is what exactly primaryPreferred and secondaryPreferred should mean (likely a broader definition of "unavailable"), but that's outside of the scope of this project.
| - The default value of `MAX_RETRIES` is 2. Drivers MUST expose `maxAdaptiveRetries` as a configurable option for | ||
| this maximum. |
There was a problem hiding this comment.
Given the no knobs mantra, we need a very good reason to add maxAdaptiveRetries.
The specification gives no justification behind the maxAdaptiveRetries knob and its default value false. I think, the justification should be added. The only justifications I found are in Decision: Driver Backpressure Retry Behavior Defaults1:
allows users to disable adaptive retries entirely by setting the maximum number of retries to 0
...
Configurable retries affect the amount of load offered to the system for a given operation.
- The aforementioned mantra says "if a typical user would have no idea how to choose a correct value, we pick a good default instead of adding a knob". How will a typical user choose a value better than our default? If the answer is that he will have to do the same analysis we did (here and here) to either choose a better value, or decide that the default is fine, then we should not add the knob.
- If it is the disabling/enabling of adaptive retries that we mostly care about, because we are afraid, that such retries will be harmful, and applications will have to disable them, then maybe we should not introduce them at all.
- Maybe we should apply the same approach that is applied to the token bucket idea: do not do it now, gather actual data, then analyze and decide? (when I was suggesting in in the summer of 2025, I was the weirdo, but here we are) - If we introduce the option now, it may end up being a useless option that pollutes the API and confuses users.
1 If the document has other justifications and I missed them - I apologize, I haven't read it fully. If there are other justifications, and you know of them, could you please state them explicitly?
There was a problem hiding this comment.
We know from our modelling and from the underpinning theory that the n-retries approach (retry up to N times on overload errors, no token bucket) can introduce retry storms as overload increases. However, the specifics of the workload and cluster serving that workload significantly impacts the threshold at which retry volume becomes an additional burden rather than a throughput improvement. Some applications and clusters may be very tolerant of many additional retries, while others may want to break out of the loop much earlier.
The selection of 2 as a default attempts to broadly pick a sensible default for most users that will on average be a benefit rather than a negative during overload. However, savvy users, the users we expect to be most affected by overload and have the most insight into the specifics of their workload and cluster, will likely find that tweaking this value on a per-workload basis produces better results. Additionally, there are situations where disabling overload retries entirely is optimal. Without a knob, those situations will cause users to either have a strictly worse experience with a new driver, or force them to downgrade to an older driver to avoid the issue. These are two strong motivations to add a knob for maxAdaptiveRetries.
There was a problem hiding this comment.
Let's not lose these motivations in the PR. Can you add to the Q&A?
There was a problem hiding this comment.
savvy users, the users we expect to be most affected by overload
- Could you please clarify why savvy users are expected to be most affected by overload?
- What do you mean by "affected by overload"? For example, do you mean their servers are expected to be overloaded more severely / for longer intervals?
savvy users...will likely find that tweaking this value on a per-workload basis produces better results
You explicitly say that the proposed option is for "savvy users", who you separate from "most users". We discussed this proposed API change within the Java driver team, and this seems as clear of a case for applying the aforementioned no knobs mantra, as there can be - it explicitly instructs not to add a knob if a typical user would have no idea how to choose a correct value. If the mantra is not applicable in this case, then in what case could it possibly be applicable?
Granted, some options may be needed even if their usage requires a user to be advanced, like readConcernLevel, because the functionality they expose is inherently advanced. But the proposed option is not one of those, as it is about performance tuning.
Additionally, there are situations where disabling overload retries entirely is optimal.
Could you please elaborate on what are those situations? Please keep in mind that an application can't change its decision about retries caused by retryable overload errors at runtime. An application will have to either choose to always have them disabled, or always have them enabled. The benefit has to be quite significant, to justify us introducing a performance knob. But even then, I would argue that the decision about such a performance knob should based on real data and/or explicit user needs.
TL;DR
- The proposed
maxAdaptiveRetriesoption clearly violates the no knobs mantra. - The proposed option is for performance tuning. Similarly to the idea of token buckets, which previously were claimed to be very needed, but no is no longer considered as such, it has to be based on the real data and/or explicit user needs, not guesses.
|
Rust second implementation: mongodb/mongo-rust-driver#1654 |
This PR updates the client backpressure specification and related specifications to align with the finalized backpressure retry policy.
A summary of these changes with brief reasonings:
MAX_RETRIESnow defaults to 2 rather than 5. Without token buckets, retry storms occur much more easily. Lowering the default number of maximum retries reduces this risk while still alleviating server overload.primaryPreferredandsecondaryPreferredcause semantic differences that we want users to explicitly opt-in to.maxAdaptiveRetries, which configuresMAX_RETRIES, andenableOverloadRetargeting, which enables overload retargeting for non-sharded servers. Both are configurable as both client and URI options.Please complete the following before merging:
clusters).