Skip to content

[Ci] Improve integration checkout resilience on self-hosted runners#22

Merged
jiahy0825 merged 18 commits intoSandAI-org:mainfrom
cennn:ci/integration-checkout-retry-mainbase
Apr 14, 2026
Merged

[Ci] Improve integration checkout resilience on self-hosted runners#22
jiahy0825 merged 18 commits intoSandAI-org:mainfrom
cennn:ci/integration-checkout-retry-mainbase

Conversation

@cennn
Copy link
Copy Markdown
Collaborator

@cennn cennn commented Apr 13, 2026

🗂️ PR Category

  • ✨ New Feature
  • 🚀 Optimization (performance, memory, etc.)
  • 💥 Breaking Change
  • 🐛 Bug Fix
  • 🛠️ Development / Refactoring
  • 📚 Documentation
  • 🧹 Chore (Dependencies, CI/CD, Configuration, etc.)
  • 🧪 Testing

📝 Description

This PR replaces the dual actions/checkout flow with a retry-based manual fetch that is more robust under unstable network/proxy conditions. It adds per-attempt timeouts, alternating strict/relaxed fetch modes (including direct fallback without proxy), stale lock-file cleanup, and a merge-base-free debug diff path to avoid shallow-fetch failures.

Replace dual actions/checkout steps with a manual git fetch strategy
that retries up to 15 times (20s interval) before failing, to reduce
transient network/proxy checkout flakes without changing low-speed
threshold settings.
@cennn cennn force-pushed the ci/integration-checkout-retry-mainbase branch from 035e140 to 7f7f6b7 Compare April 13, 2026 08:33
cennn added 6 commits April 13, 2026 16:38
Handle pre-existing .git directories by reusing/updating origin remote
instead of blindly running `git remote add origin`.
Keep strict low-speed thresholds while using proxy in early attempts,
then fallback to proxy-unset direct fetch with relaxed low-speed limits
for later retries.
Shorten retry backoff from 20s to 5s to speed up recovery when
transient network errors clear quickly.
Remove common stale .git lock files before/after each retry attempt
so leftover locks from previous interrupted runs don't block fetch.
Use two-dot diff (`base..head`) instead of three-dot to prevent
`no merge base` failures when only shallow commit objects are fetched.
Switch checkout retries to strict/relaxed alternating strategy (odd
attempts strict via proxy, even attempts relaxed direct mode).
@cennn cennn changed the title [Ci] Integration test: Checkout PR head with retry [Ci] Improve integration checkout resilience on self-hosted runners Apr 13, 2026
Comment thread .github/workflows/integration_test.yml Outdated
cennn added 11 commits April 13, 2026 20:28
Move the long retry-based PR checkout logic out of integration_test.yml into .github/scripts/checkout_pr.sh, and keep the workflow step minimal with only REPO_URL/BASE_SHA/HEAD_SHA inputs.
Inline the checkout retry logic back into integration_test.yml so the step does not depend on an external script file that may be unavailable in runner workspaces.
Download .github/scripts/checkout_pr.sh from PR head SHA via GitHub contents API before running checkout, so the workflow no longer depends on local script availability in stale workspaces.
git-fetch consistently times out on self-hosted runners due to network
instability. Switch to downloading the HEAD tarball via GitHub REST API
(single HTTP request, curl-based) as the primary method, with git-fetch
kept as a fallback.
After downloading HEAD/BASE tarballs via GitHub API, create a local git
repo with two commits (base -> head) and git-replace refs so that
`git rev-parse <sha>` and `git diff base..head` work correctly for
downstream CI steps (check_chinese_chars, pre-commit, etc.).
Replace bash checkout_pr.sh with checkout_pr.py. Strategy unchanged:
git-fetch with retry first, tarball fallback with synthetic git history.
git-fetch consistently times out on self-hosted runner; tarball via
GitHub API is reliable. Swap order: tarball first, git-fetch fallback.
urllib honors env proxy which may throttle large downloads. Switch to
curl subprocess with alternating proxy/direct attempts.
git-replace is unreliable for mapping real SHAs to synthetic commits.
Instead, output local base_ref/head_ref via GITHUB_OUTPUT and use
step outputs in downstream steps (check_chinese_chars).
Copy link
Copy Markdown
Collaborator

@jiahy0825 jiahy0825 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jiahy0825 jiahy0825 merged commit c966042 into SandAI-org:main Apr 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants