JAVA-5950 Update Transactions Convenient API with exponential backoff on retries#1899
JAVA-5950 Update Transactions Convenient API with exponential backoff on retries#1899nhachicha wants to merge 61 commits intomongodb:backpressurefrom
Conversation
…Impl.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…s exceeded (ex operationContext.getTimeoutContext().getReadTimeoutMS())
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
I manually copied it from mongodb#1899.
stIncMale
left a comment
There was a problem hiding this comment.
The last reviewed commit is 43dab53.
Most if not all of the outstanding comments have reactions/replies suggesting that they were agreed with and addressed, but I did not find the corresponding changes. I suspect, the changes were not pushed.
driver-core/src/test/unit/com/mongodb/internal/time/ExponentialBackoffTest.java
Outdated
Show resolved
Hide resolved
| @@ -249,15 +257,26 @@ public <T> T withTransaction(final TransactionBody<T> transactionBody) { | |||
| @Override | |||
| public <T> T withTransaction(final TransactionBody<T> transactionBody, final TransactionOptions options) { | |||
There was a problem hiding this comment.
this is orthogonal to this PR
It is unrelated, indeed, but I noticed all that while trying to make our withTransaction method to look like it follows the spec. The open telemetry implementation we have is recent, as far as I know, and its state does not seem good. That is surprising, given that it's not some old code that went out of shape as a result of having been modified many times without ever having been refactored to make sense again.
1.
Our tracing layer uses Micrometer as the OTel reference implementation, and the APIs use different terminology. e.g. Micrometer's uses stop whereas OTel uses end
So we have
a) The two APIs mentioned use the terms "stop" and "end"
b) The drivers specification uses "finish", the Java driver implementation uses "finalize".
I fail to see how b) reasonably follows from a).
It's worth aligning with the spec
How did we end up unaligned, when we authored both?
2, 3
The open telemetry specification "defines requirements for drivers' OpenTelemetry integration and behavior". I am guessing, that is to ensure that different drivers emit the same telemetry in the same way. However, how will other drivers know how to emit it for withTransaction, when none of the behavior you described above is in the specification? (I still don't really know what the behavior is supposed to be).
Testing
Many open telemetry specification tests were skipped in the Java driver, with a reference to https://jira.mongodb.org/browse/JAVA-5991, and then the ticket was closed. But nothing in the ticket explains why they are skipped and whether that is supposed to change (@rozza reopened the ticket as a result). This is especially surprising given that we were the authors of the open telemetry specification.
There was a problem hiding this comment.
Assuming 👍 was meant to express that you agree with the comment and it is addressed, I can't find the corresponding requested comment in #1918. Could you please add it?
| private static MongoException timeoutException(final boolean hasTimeoutMS, final Throwable cause) { | ||
| return hasTimeoutMS | ||
| ? createMongoTimeoutException(cause) // CSOT timeout exception | ||
| : new MongoTimeoutException("Operation exceeded the timeout limit", cause); // Legacy timeout exception | ||
| } |
There was a problem hiding this comment.
I don't think that the change in d4bc4c7 is the proper way of addressing this. Most of what I wrote previously in this thread was not addressed.
The following thoughts won't add anything new to what I expressed above, but they will be clearer than before, because now there is less uncertainty/questions:
- The spec changes made in DRIVERS-3391 need more work/fixes (such a new change requires a new DRIVERS ticket):
- The Note 1 should be changed such that it instructs to add all the error labels from the wrapped error, regardless of what the wrapped error is1.
- The spec currently says "report a timeout error wrapping the last error", but then refers to the wrapped error as "underlying error". The spec should say "wrapped error" instead of introducing another word that is supposed to have the same meaning.
- The "Note 1" in the Retry Timeout is Enforced prose tests should be updated to instruct the drivers to assert that the timeout error has the same labels as the error it wraps.
- It seems that constructors of
MongoExceptionshould be responsible for copying labels. However,MongoException(@Nullable final String msg, @Nullable final Throwable t)does not do that, whileMongoException(final int code, final String msg, final Throwable t)does. We should figure out whether this was clearly intentional and copying labels in the aforementioned constructor will be a bug, or if the current situation is a bug, and the constructor must have been copying labels.
1 Strictly speaking, we need to copy only the labels the driver exposes for applications to use (those are exposed via constants in MongoException). However, given that no driver, including ours, hides other labels, there is no reason to complicate the logic or the specification here.
…lBackoffTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>
- Add SYSTEM_OVERLOADED_ERROR_LABEL and RETRYABLE_ERROR_LABEL constants to MongoException - Add backpressure:true to hello command in InternalStreamConnectionInitializer - Make CommandOperationHelper and its error label constants public - Replace hardcoded error label strings with constants in tests and examples - Refactor ExponentialBackoff: make TRANSACTION_BASE_MS and TRANSACTION_GROWTH private, split testCustomJitter into two tests, minor Javadoc/assertion message fixes - Remove redundant private constructor from TimeoutContext - Convert block comments to Javadoc in WithTransactionProseTest, refactor testRetryBackoffIsEnforced
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
driver-sync/src/test/functional/com/mongodb/client/RetryableWritesProseTest.java
Show resolved
Hide resolved
driver-sync/src/test/functional/com/mongodb/client/RetryableWritesProseTest.java
Show resolved
Hide resolved
...-sync/src/test/functional/com/mongodb/client/MongoWriteConcernWithResponseExceptionTest.java
Show resolved
Hide resolved
driver-core/src/main/com/mongodb/internal/operation/CommandOperationHelper.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
driver-sync/src/main/com/mongodb/client/internal/ClientSessionImpl.java
Outdated
Show resolved
Hide resolved
driver-core/src/main/com/mongodb/internal/connection/InternalStreamConnectionInitializer.java
Show resolved
Hide resolved
…Impl.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
driver-sync/src/test/functional/com/mongodb/client/WithTransactionProseTest.java
Show resolved
Hide resolved
driver-sync/src/test/functional/com/mongodb/client/WithTransactionProseTest.java
Show resolved
Hide resolved
driver-sync/src/test/functional/com/mongodb/client/WithTransactionProseTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
UnknownTransactionCommitResult is retriable in the commit loop if we don't exceed the timeout, so it makes sense to wrap it into a Timeout error if we exceed the timeout and want to throw and return (as described in section 10.1.1)
…re_convenient_api
Original PR accidentally closed #1852, it has outstanding review comments for @stIncMale to go over when re-reviewing.
Relevant specification changes:
JAVA-5950, JAVA-6046, JAVA-6093, JAVA-6113 (only the part about
transactions-convenient-api)AI review
Review generated by Claude Opus 4.6 as of commit `4a3d1ae1` on 2026-04-01.
Findings Table — Diff-Only Review vs. PR Context
testJitterSupplierstatic field — data race risk across threadsvolatileat minimum.Thread.sleep(backoffMs)may overshoot remaining timeoutshortenBy(...).onExpired(...)checks before sleeping; next iteration fails fast if expired. Overshoot bounded byclearTransactionContextOnError(e)call removedcommitTransactioncalls it internally; the outer call was redundant.CommandOperationHelper+ constantscom.mongodb.internalis internal by definition. Enables cross-package dedup.@VisibleForTestingontimeoutOrAlternative60acf51d).TRANSACTION_MAX_MSpackage-private; hardcoded expected values in testEXPECTED_BACKOFFS_MAX_VALUES.length(adopted). Hardcoded array is anbackpressure→mainmerge. Blocked on docs PR.testRetryBackoffIsEnforcedwall-clock timing sensitivitysetTestJitterSupplier. 500ms tolerance. Spec-mandated prose test.copyTimeoutContext()— verify no other callersSummary: 4 false positives, 2 resolved, 1 valid+intentional, 1 tracked, 1 accepted, 1 minor nit.
Additional Findings from PR Reviewers (not caught in diff-only review)
calculateTransactionBackoffMsJavadoc said 0-based but implementation is 1-basedMongoTimeoutExceptiond4bc4c70)applyMajorityWriteConcernToTransactionOptionscalled on outer retry60acf51d)withTransactioncode structure doesn't follow spec algorithm ordering60acf51d)mainupdated separatelyMongoTimeoutExceptionmessage inconsistency (missing period)timeoutOrAlternativeremoval +timeoutMsConfiguredrenaming60acf51d)Remaining Issues to Address
testJitterSuppliershould bevolatile— mutable static read/written across threads without memory visibility guaranteeExponentialBackoff.java|
| 2 | Spec "Note 1" about error propagation needs rework — spec language around propagation/raising is inconsistent | Medium | Spec-side | Open — spec change needed | DRIVERS-3436 |
| 3 |
TODO-BACKPRESSUREcomments in production code — must be resolved before mergingbackpressure→main| Low |MongoException.java,ExponentialBackoff.java| Open — blocked on docs PR(10gen/docs-mongodb-internal#17281) | Implicit |
| 4 | Spec submodule not pointing to latest spec tests — new
transactions-convenient-apiJSON tests not included | Medium |testing/resources/specifications(submodule) | Open —mainupdated(
55e1861) but not merged intobackpressure| TODO-BACKPRESSURE || 5 | Verify spec test
withTransaction surfaces a timeout after exhausting transient transaction retriesis run | Medium | Test runner config | Open — reminder in PR comments | — || 6 | Verify prose tests assert all error labels are copied to wrapping exception | Medium |
WithTransactionProseTest.java| Partially addressed — labels copied in code, test coverage completeness notconfirmed | — |
| 7 | OTel tracing terminology inconsistency (
finalizevsfinishvsstop) | Low (orthogonal) |ClientSessionImpl.javatracing code | Open — no ticket yet | — |Priority Summary
backpressure: Items 5, 6main: Items 3 (TODOs), 4 (submodule)What Looks Good
ClientSessionClocksingleton withSystemNanoTime+ Mockito is a significant test infrastructure improvementExponentialBackoffTesthas good coverage of boundary conditions (jitter=0, jitter=1, cap enforcement)backpressure: truehandshake flag is a clean protocol extensionabortIfInTransaction()extraction reduces duplication and improves claritywithTransaction(60acf51d) now aligns the code with the spec algorithm, making correctness easier to verifytestRetryBackoffIsEnforced,testExponentialBackoffOnTransientError) provide functional validation of backoff behavior