Skip to content

Implement per-job single retry of the failed tests #738

@asirko-soft

Description

@asirko-soft

Background: Flaky tests are a known source of instability in our validation workflows, especially in the REPL and Darwin jobs. These transient failures (such as network errors or timeouts due to a slow runner) cause entire jobs to fail for reasons unrelated to the code changes, forcing developers to manually intervene. This wastes time and consumes unnecessary compute resources.

Proposal: Implement a selective, automated retry mechanism. Instead of failing the entire job on the first error, the workflow will capture the names of failed tests and re-run only those specific tests. The job will be marked as successful if the retry passes and will only fail if the same tests fail a second time. Flakiness data will be captured as a build artifact for long-term analysis.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

In review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions