Skip to content

Conversation

@DavideD
Copy link
Member

@DavideD DavideD commented Oct 28, 2025

Fix #2518

This PR add the following methods to the Mutiny and Stage API:

Session openSessionWithLazyConnection();
Session openSessionWithLazyConnection(String tenantId);
StatelessSession openStatelessSessionWithLazyConnection();
StatelessSession openStatelessSessionWithLazyConnection(String tenantId);

I've annotated them with @Incubating.

Overall, it seems to work fine. But I decided to add a test that mimic the id generation and, if I don't use transactions, it seems to get stuck (even with a low number of ids). I don't know why. The same test seems to work fine when the connection is not opened lazily. It's possible that there's something wrong with the test.
I've applied the changes that make the test fail on a different branch: b511efa

Basically, if I call persist and then flush, instead of withTransaction, the test get stuck (the test is MultithreadedInsertionWithLazyConnectionTest)

openSessionWithLazyConnection seems a bit of a mouthful but I don't have strong opinions about it.
Originally, it was called only createSession.

Comment on lines +309 to +310
return connection().thenCompose( conn -> conn
.insertAndSelectIdentifier( sql, paramValues, idClass, idColumnName ) );

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
ReactiveConnection.insertAndSelectIdentifier
should be avoided because it has been deprecated.
Comment on lines +319 to +320
return connection().thenCompose( conn -> conn
.insertAndSelectIdentifierAsResultSet( sql, paramValues, idClass, idColumnName ) );

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
ReactiveConnection.insertAndSelectIdentifierAsResultSet
should be avoided because it has been deprecated.

@Override
public DatabaseMetadata getDatabaseMetadata() {
Objects.requireNonNull( connection, "Database metadata not available until the connection is opened" );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I'm throwing a NullPointerException here for now if they try to getDatabaseMetadata without a connection

@DavideD DavideD force-pushed the 2518-Lazy-connection branch from ed416b8 to 0dee152 Compare October 30, 2025 14:59
@DavideD DavideD force-pushed the 2518-Lazy-connection branch from 0dee152 to 6ff2498 Compare October 31, 2025 16:22
@DavideD DavideD force-pushed the 2518-Lazy-connection branch from 6ff2498 to b1c5be0 Compare November 3, 2025 09:18
@DavideD
Copy link
Member Author

DavideD commented Nov 3, 2025

After looking into it, I figured out that the test I added for the id generation fails because the event loop thread changes during the execution when using the lazy connection approach. The test was getting stuck because of an error in the session.close method.

@yrodiere what should we do? Merge these changes and fixed the issues later? Or wait until we figure out how to solve this?
To summarize:

  • The original id generation test: the connection is open as soon as the session is created; we insert entities using persist and flush operations (without opening a transaction);
  • The new test for this issue that fails: the connection is open lazily; it fails with HR000069: Detected use of the reactive Session from a different Thread than the one which was used to open the reactive Session; it works fine if we use transactions

@tsegismont, maybe you can help with this?

@yrodiere
Copy link
Member

yrodiere commented Nov 3, 2025

@yrodiere what should we do? Merge these changes and fixed the issues later? Or wait until we figure out how to solve this?

I think we should investigate before merging, at least.

I have a few questions:

  1. Is the test code correct? I wonder why we're no longer using transactions in particular.
  2. Is the check correct? I.e. are we sure the thread we're expecting is the correct one?
  3. After connection opening, can we explicitly "force" the execution back on the thread/event loop that was initially used to open the session?

@yrodiere
Copy link
Member

yrodiere commented Nov 3, 2025

Is the check correct? I.e. are we sure the thread we're expecting is the correct one?

Looking at the code... when opening the connection eagerly we do this:

return uni( () -> connection( getTenantIdentifier( options ) ) )
.chain( reactiveConnection -> create(
reactiveConnection,
() -> new ReactiveSessionImpl( delegate, options, reactiveConnection )
) )
.map( s -> new MutinySessionImpl( s, this ) );

... which means that call new ReactiveSessionImpl (and determine the "expected thread") in a callback to the reaction opening. So if opening the connection makes us move to another event loop, we won't detect it: we'll just check that all operations after the connection opening run on the same event loop.

In the lazy connection opening case, we first call new ReactiveSessionImpl (and determine the "expected thread"), then open the connection, then ultimately in a callback to that we check the thread is the same one.

So... I think you've uncovered a pre-existing bug, which is: we potentially switch to another event loop on connection opening. It's bad in both the lazy and eager connection opening cases, but it's only caught by our check with lazy connection opening.

@DavideD
Copy link
Member Author

DavideD commented Nov 3, 2025

In the lazy connection opening case, we first create the session (and determine the "expected" thread), then open the connection, then in a callback to that we check the thread is the same one.

I'm not sure about this, the error happens when the session gets closed, if that's the reason of the failure, wouldn't that happen during the persist? But maybe there's something wrong when we close the session. In any case, I will look into it.

@yrodiere
Copy link
Member

yrodiere commented Nov 3, 2025

I'm not sure about this, the error happens when the session gets closed, if that's the reason of the failure, wouldn't that happen during the persist?

Good point.

Perhaps related... I found this code:

<T> Uni<T> uni(Supplier<CompletionStage<T>> stageSupplier) {
return Uni.createFrom().completionStage( stageSupplier ).runSubscriptionOn( context );
}

It's fine, but hints that something specific is needed to keep things on the same event loop, especially when dealing with completion stages.

So... I'm entirely unsure what happens when you're not using Mutiny and thus not using this code to make sure everything runs in the same event loop. Which is precisely the case in this test, since it relies on completion stages instead of Mutiny.

Did you try to convert this test to use mutiny APIs instead, see if it solves the problem?

@tsegismont
Copy link
Contributor

@tsegismont, maybe you can help with this?

What's the question exactly?

@DavideD
Copy link
Member Author

DavideD commented Nov 4, 2025

What's the question exactly?

  • I'm a bit surprised that the test fails, do you see anything particularly wrong in the way I'm using CompletionStage with Verticles in this test class?
  • The current test works fine and everything seems to work as expected. But if I change this line to .persist( entity ).thenCompose( v -> s.flush() ), it fails when the session gets closed because the operation happens in a different event loop than the original one. We save the thread name here and the session.close operation happens here. A similar test works without issue.

We had some discussions in the past about removing the checks because they are too strict. I'm not sure why we didn't remove them in the past and why we didn't have more issues related to this.

I think it's an Hibernate Reactive issue, but maybe you can let us know if there's something else that we are missing.

@DavideD
Copy link
Member Author

DavideD commented Nov 4, 2025

I'm going to merge this because I think it solves the issue. And after #2494 is done, we will be able to remove the checks (I think).

@gavinking
Copy link
Member

I doubt we can remove this sort of check.

We need to be sure that we're always being called with the same Vert.x duplicated context, otherwise we open ourselves up to very subtle bugs.

@DavideD DavideD changed the title Open connections lazily [4.2] Open connections lazily Nov 4, 2025
@DavideD DavideD added the 4.2 label Nov 4, 2025
@DavideD DavideD added this to the 4.2.0.Beta1 milestone Nov 4, 2025
@yrodiere
Copy link
Member

yrodiere commented Nov 4, 2025

We need to be sure that we're always being called with the same Vert.x duplicated context, otherwise we open ourselves up to very subtle bugs.

But we're not checking we're called with the same Vert.x context, just that we're running in the same thread.

Which is no longer relevant after #2494... ?

@gavinking
Copy link
Member

gavinking commented Nov 4, 2025

But we're not checking we're called with the same Vert.x context, just that we're running in the same thread.

I get that, but in the past (I believe) the only way we could be sure we were in the same duplicated context was to be on the same thread.

Which is no longer relevant after #2494... ?

Can you give me a summary of what changed there?

@yrodiere
Copy link
Member

yrodiere commented Nov 4, 2025

But we're not checking we're called with the same Vert.x context, just that we're running in the same thread.

I get that, but in the past (I believe) the only way we could be sure we were in the same duplicated context was to be on the same thread.

AFAIK being on the same thread is no guarantee you're still using the same context. Threads can switch context back and forth.

But, more to the point... I don't know about before, but right now we are storing the current thread in a variable and later checking the current thread is still the same. We could just change that to store/check the current vert.x context instance, e.g. Vertx.currentContext()? I believe that check would pass.

Or am I missing some obvious reason we're checking the thread instead?

Which is no longer relevant after #2494... ?

Can you give me a summary of what changed there?

We would make sure that if you trigger two calls on the session, they will never interleave. E.g. here:

return sessionFactory.withSession(s -> {
   return Uni.combine().combine().all().unis(s.doSomething(), s.doSomethingElse()).asTuple();
});

doSomething() and doSomethingElse() could potentially be broken down in multiple tasks/callbacks, which given there is no dependency between the two could result in those multiple tasks/callbacks being interleaved. That's a problem because the Session is stateful.

If we make absolutely sure that doSomethingElse() will wait until doSomething() is completed before executing, the interleaving is gone.

That's the specific problem #2494 intends to fix, and it's mostly irrelevant to the thread check. But the solution mentioned in #2494 would help for another (worse) problem.

If we have the thread check, it means we assume callbacks could possibly be executed on different event loops (and given the failure Davide encountered, it's indeed possible). If that's possible, then the code above could result in tasks/callbacks being executed in parallel, not just interleaved. With all the concurrency problems this implies.

Fortunately if this "serialization" of calls is enforced, even switching threads we're sure we will never execute operations in parallel. Hence the thread switching is no longer relevant.

Now, the Vert.x context switching still very much is, I'll grant you that. See above :)

@gavinking
Copy link
Member

AFAIK being on the same thread is no guarantee you're still using the same context.

Of course. That's certainly correct.

The implication was in the other direction: that Vert.x was supposed to guarantee thread affinity for callbacks. And so if it was a different thread, we knew something was wrong.

We could just change that to store/check the current vert.x context instance, e.g. Vertx.currentContext()?

That's just the Vert.x context. Not the duplicated context AFAIK.

I believe that check would pass.

Well it would, for sure, but it wouldn't tell us anything about the duplicated context.

@yrodiere
Copy link
Member

yrodiere commented Nov 4, 2025

At the risk of stating the obvious, a duplicated context is a context, and if you're using the same context you're using the same duplicated context.

I feel like we're running in circles, so maybe let's have a talk to discuss this?

@gavinking
Copy link
Member

At the risk of stating the obvious, a duplicated context is a context, and if you're using the same context you're using the same duplicated context.

If this is the case then something has changed in a fundamental way since I last looked at all this.

That definitely wasn't true when this code was written.

@yrodiere
Copy link
Member

yrodiere commented Nov 4, 2025

I talked to @gavinking and removing the check after #2494 is probably too bold. So let's not.

I created #2736 to address the check issue one way or another; we can work on that as our next step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lazy connection opening / transaction starting

4 participants