Skip to content

Conversation

@rionmonster
Copy link
Contributor

Purpose

Linked issue: close #1873

Per Issue #1873, this pull request addresses a race condition that could sometimes occur and result in the ReplicaTest.testBrokenSnapshotRecovery test case failing (particularly during CI builds).

Brief change log

This change introduces a static triggerSnapshotTaskWithRetry helper function which wraps the existing scheduledExecutorService instance and issues the trigger functions within a retry loop and minor delay to help improve reliability. The previous explicit scheduledExecutorService.triggerNonPeriodicScheduledTask() calls have been replaced with this new function in the affected test case.

Tests

The ReplicaTest.testBrokenSnapshotRecovery was initially updated to use an iterative approach (e.g., repeat 50 times) as mentioned in the original issue to reproduce the issue. After this was repeatably reproducible, the proposed retry-helper was introduced to verify the test would repeatedly pass through all of the iterations.

The only change to the test were the previous replacement of the explicit triggerNonPeriodicScheduledTask() calls with the newly added wrapper.

API and Format

N/A

Documentation

N/A

[server] Added Retry Handler to Address Snapshotting Test Asynchrony
@polyzos polyzos merged commit 37f46dd into apache:main Oct 30, 2025
5 checks passed
wuchong pushed a commit that referenced this pull request Nov 2, 2025
…1881)

[server] Added Retry Handler to Address Snapshotting Test Asynchrony

(cherry picked from commit 37f46dd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test ReplicaTest.testBrokenSnapshotRecovery(File) is unstable

2 participants