Skip to content

DRIVERS-3427 - Finalize client backpressure implementation for phase 1 rollout#1919

Open
NoahStapp wants to merge 9 commits intomongodb:masterfrom
NoahStapp:DRIVERS-3427
Open

DRIVERS-3427 - Finalize client backpressure implementation for phase 1 rollout#1919
NoahStapp wants to merge 9 commits intomongodb:masterfrom
NoahStapp:DRIVERS-3427

Conversation

@NoahStapp
Copy link
Copy Markdown
Contributor

@NoahStapp NoahStapp commented Mar 30, 2026

This PR updates the client backpressure specification and related specifications to align with the finalized backpressure retry policy.

A summary of these changes with brief reasonings:

  • Token buckets are removed entirely. Jitter + exponential backoff is now the sole backpressure retry strategy. Token buckets may be added in the future pending additional analysis and telemetry.
  • MAX_RETRIES now defaults to 2 rather than 5. Without token buckets, retry storms occur much more easily. Lowering the default number of maximum retries reduces this risk while still alleviating server overload.
  • Server deprioritization (renamed to overload retargeting) is disabled by default. Edge cases with primaryPreferred and secondaryPreferred cause semantic differences that we want users to explicitly opt-in to.
  • Two new client options have been added: maxAdaptiveRetries, which configures MAX_RETRIES, and enableOverloadRetargeting, which enables overload retargeting for non-sharded servers. Both are configurable as both client and URI options.
  • All tests have been updated to reflect the above, and new tests have been added.

Please complete the following before merging:

  • Is the relevant DRIVERS ticket in the PR title?
  • Update changelog.
  • Test changes in at least one language driver. Python.
  • Test these changes against all server versions and topologies (including standalone, replica set, and sharded
    clusters).

@NoahStapp NoahStapp changed the title Drivers 3427 - Finalize client backpressure implementation for phase 1 rollout DRIVERS 3427 - Finalize client backpressure implementation for phase 1 rollout Mar 30, 2026
@NoahStapp NoahStapp changed the title DRIVERS 3427 - Finalize client backpressure implementation for phase 1 rollout DRIVERS-3427 - Finalize client backpressure implementation for phase 1 rollout Mar 30, 2026
@NoahStapp NoahStapp marked this pull request as ready for review March 30, 2026 20:44
@NoahStapp NoahStapp requested review from a team as code owners March 30, 2026 20:44
@NoahStapp NoahStapp requested review from JamesKovacs, alcaeus, dariakp, stIncMale and tadjik1 and removed request for a team and JamesKovacs March 30, 2026 20:44
@NoahStapp NoahStapp requested a review from a team as a code owner March 31, 2026 14:58
@NoahStapp NoahStapp requested review from isabelatkinson and removed request for a team March 31, 2026 14:58
Comment on lines +58 to +59
The sum of 2 backoffs is 0.3 seconds. There is a 0.3-second window to account for potential variance between the
two runs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the logic used in the previous version of this test, wouldn't this make the comparison value in the assertion 0? (i.e. backoff time minus buffer window)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, correct.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that this test will be flaky because of the small time values. WDYT about restoring the original logic and just setting maxAdaptiveRetries=5? This would also give us better confirmation that the proper values for backoff are being calculated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add another test to explicitly verify proper backoff calculation that configures maxAdaptiveRetries. The purpose of this test is to verify that the backoff is used, which is independent of the actual time values. We could add a larger buffer window if flakiness is a concern.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed over slack, we decided to update the prose test to assert on a window of values rather than a lower bound.

NoahStapp and others added 2 commits March 31, 2026 15:48
Co-authored-by: Isabel Atkinson <isabelatkinson@gmail.com>
Comment on lines +58 to +59
The sum of 2 backoffs is 0.3 seconds. There is a 0.3-second window to account for potential variance between the
two runs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that this test will be flaky because of the small time values. WDYT about restoring the original logic and just setting maxAdaptiveRetries=5? This would also give us better confirmation that the proper values for backoff are being calculated.

Co-authored-by: Isabel Atkinson <isabelatkinson@gmail.com>
@NoahStapp NoahStapp removed request for a team, alcaeus, stIncMale and tadjik1 April 1, 2026 16:51
Comment on lines +143 to +146
4. If the request is eligible for retry (as outlined in step 2 above) and `enableOverloadRetargeting` is enabled, the
client MUST add the previously used server's address to the list of deprioritized server addresses for
[server selection](../server-selection/server-selection.md). Drivers MUST expose `enableOverloadRetargeting` as a
configurable boolean option that defaults to `false`.
Copy link
Copy Markdown
Member

@stIncMale stIncMale Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the no knobs mantra, we need a very good reason to add enableOverloadRetargeting.

The specification gives no justification behind the enableOverloadRetargeting knob and its default value false. I think, the justification should be added, if it is eventually agreed that it exists. The only justification I found is in Decision: Driver Backpressure Retry Behavior Defaults:

a)

A primaryPreferred readPreference currently only reads from a secondary node if the primary is unavailable. Making an overloaded primary count as “unavailable” changes this semantically, but is a net benefit for users unless they truly do not want any secondary reads unless the primary is actually down.

b)

The same is true for secondaryPreferred reads and primary routing. As a result, users must explicitly opt-in to this feature.

The justification does not seem correct to me, and I don't think we should add the enableOverloadRetargeting option.

a)

The alleged interpretation of primaryPreferred as a guarantee that no secondary read happens _"unless the primary is actually down" does not seem to make sense.

  1. Formally, there is no such thing as "actually down". A driver may mark the primary as Unknown simply as a result of a temporary network issue, even as a result of a network timeout if it causes authentication to fail (see https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.md#why-mark-a-server-unknown-after-an-auth-error). When this happens, secondary reads may happen, but I am guessing that no one will say that the primary is "actually down".
    • The documentation of this read concern is: "In most situations, operations read from the primary member of the set. However, if the primary is unavailable, as is the case during failover situations, operations read from secondary members that satisfy...". Note how it neither defines "unavailable", nor limits the situations strictly to failover.
  2. Even if "actually down" were formally a thing (for example, one may define "actually down" as a situation when the driver's cluster view does not have RSPrimary), unless there is a way for an application to determine that a secondary read was executed while the primary was not being "actually down", it can't tell whether the alleged semantics of primaryPreferred is violated. And there is no such way, not even via the cluster/server listeners, because there is no specified order between the cluster/server events and the execution of operations. Therefore, this alleged semantics is meaningless.

Any application that uses primaryPreferred must be prepared to deal with secondary reads, and us deprioritizing the primary does not violate any formal guarantees of the primaryPreferred read concern.

b)

The b) part is even weaker:

  • secondaryPreferred, similarly to primaryPreferred, means that the application is ready to deal with both primary and secondary reads.
  • On top of that, there can be no consistency guarantees the application expects from secondary reads that could be violated by primary reads.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current contract we've made with users utilizing primaryPreferred is that reads will only go to a secondary if the primary is unavailable. The documentation does not explicitly define unavailable, but in practice that means the primary is unselectable. Overload retargeting makes the primary unselectable for a retry operation if it returned an overload error on a previous attempt. This materially changes how often secondary reads occur. Since secondary reads can result in stale data, enabling overload retargeting increases the chance that users of primaryPreferred will get stale data when they did not previously. This is a semantic change, and so we've made the decision to turn it off by default and add a knob to enable it.

Overload retargeting significantly increases availability during overload, but it does increase the risk of getting stale data when used with primaryPreferred. We may find that users of primaryPreferred prefer that behavior and we turn on retargeting by defualt in the future, but that's not in the scope of this initial release.

secondaryPreferred does not have this same staleness issue, but it still materially changes what the preference means from "almost always secondary" to "sometimes primary".

The larger discussion here is what exactly primaryPreferred and secondaryPreferred should mean (likely a broader definition of "unavailable"), but that's outside of the scope of this project.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add to the Q&A?

Comment on lines +125 to +126
- The default value of `MAX_RETRIES` is 2. Drivers MUST expose `maxAdaptiveRetries` as a configurable option for
this maximum.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the no knobs mantra, we need a very good reason to add maxAdaptiveRetries.

The specification gives no justification behind the maxAdaptiveRetries knob and its default value false. I think, the justification should be added. The only justifications I found are in Decision: Driver Backpressure Retry Behavior Defaults1:

allows users to disable adaptive retries entirely by setting the maximum number of retries to 0
...
Configurable retries affect the amount of load offered to the system for a given operation.

  1. The aforementioned mantra says "if a typical user would have no idea how to choose a correct value, we pick a good default instead of adding a knob". How will a typical user choose a value better than our default? If the answer is that he will have to do the same analysis we did (here and here) to either choose a better value, or decide that the default is fine, then we should not add the knob.
    • If it is the disabling/enabling of adaptive retries that we mostly care about, because we are afraid, that such retries will be harmful, and applications will have to disable them, then maybe we should not introduce them at all.
  2. Maybe we should apply the same approach that is applied to the token bucket idea: do not do it now, gather actual data, then analyze and decide? (when I was suggesting in in the summer of 2025, I was the weirdo, but here we are) - If we introduce the option now, it may end up being a useless option that pollutes the API and confuses users.

1 If the document has other justifications and I missed them - I apologize, I haven't read it fully. If there are other justifications, and you know of them, could you please state them explicitly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know from our modelling and from the underpinning theory that the n-retries approach (retry up to N times on overload errors, no token bucket) can introduce retry storms as overload increases. However, the specifics of the workload and cluster serving that workload significantly impacts the threshold at which retry volume becomes an additional burden rather than a throughput improvement. Some applications and clusters may be very tolerant of many additional retries, while others may want to break out of the loop much earlier.

The selection of 2 as a default attempts to broadly pick a sensible default for most users that will on average be a benefit rather than a negative during overload. However, savvy users, the users we expect to be most affected by overload and have the most insight into the specifics of their workload and cluster, will likely find that tweaking this value on a per-workload basis produces better results. Additionally, there are situations where disabling overload retries entirely is optimal. Without a knob, those situations will cause users to either have a strictly worse experience with a new driver, or force them to downgrade to an older driver to avoid the issue. These are two strong motivations to add a knob for maxAdaptiveRetries.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not lose these motivations in the PR. Can you add to the Q&A?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

savvy users, the users we expect to be most affected by overload

  • Could you please clarify why savvy users are expected to be most affected by overload?
  • What do you mean by "affected by overload"? For example, do you mean their servers are expected to be overloaded more severely / for longer intervals?

savvy users...will likely find that tweaking this value on a per-workload basis produces better results

You explicitly say that the proposed option is for "savvy users", who you separate from "most users". We discussed this proposed API change within the Java driver team, and this seems as clear of a case for applying the aforementioned no knobs mantra, as there can be - it explicitly instructs not to add a knob if a typical user would have no idea how to choose a correct value. If the mantra is not applicable in this case, then in what case could it possibly be applicable?

Granted, some options may be needed even if their usage requires a user to be advanced, like readConcernLevel, because the functionality they expose is inherently advanced. But the proposed option is not one of those, as it is about performance tuning.

Additionally, there are situations where disabling overload retries entirely is optimal.

Could you please elaborate on what are those situations? Please keep in mind that an application can't change its decision about retries caused by retryable overload errors at runtime. An application will have to either choose to always have them disabled, or always have them enabled. The benefit has to be quite significant, to justify us introducing a performance knob. But even then, I would argue that the decision about such a performance knob should based on real data and/or explicit user needs.

TL;DR

  1. The proposed maxAdaptiveRetries option clearly violates the no knobs mantra.
  2. The proposed option is for performance tuning. Similarly to the idea of token buckets, which previously were claimed to be very needed, but no is no longer considered as such, it has to be based on the real data and/or explicit user needs, not guesses.

@isabelatkinson
Copy link
Copy Markdown
Contributor

Rust second implementation: mongodb/mongo-rust-driver#1654

@NoahStapp NoahStapp requested a review from isabelatkinson April 2, 2026 15:25
@NoahStapp NoahStapp requested a review from stIncMale April 2, 2026 16:13
@dariakp dariakp requested review from connorsmacd and jyemin April 2, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants