Skip to content

Conversation

@Gautam-aman
Copy link

testRestore assumed that a non-periodic snapshot task is always scheduled,
which is not guaranteed due to asynchronous scheduling. This caused
intermittent NoSuchElementException failures in CI.

This change makes snapshot triggering retry-safe and consistent with
other snapshot tests, eliminating flakiness.

Fixes #2276

Signed-off-by: Aman Gautam <amangautam2128@gmail.com>
@leekeiabstraction
Copy link
Contributor

Thank you for the PR! Left some comments

@Gautam-aman
Copy link
Author

Thanks for the suggestion. I agree centralizing this would be cleaner long-term.
I kept the handling local to this test to keep the fix minimal and avoid changing shared test utilities. Happy to follow up with a refactor PR if preferred.

Copy link
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the handling local to this test to keep the fix minimal and avoid changing shared test utilities. Happy to follow up with a refactor PR if preferred.

I had a quick look, it seems ReplicaTest.testRestore() is the only place that calls removeNonPeriodicScheduledTask(). I believe handling the exception within removeNonPeriodicScheduledTask() is cleaner. not removing when queue is already empty is cleaner.

@Gautam-aman
Copy link
Author

Thanks for the suggestion. I’ve removed the remaining try/catch and now rely on removeNonPeriodicScheduledTask() to safely handle the empty case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unstable test ReplicaTest.testRestore

2 participants