Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 28, 2025

Benchmark comparisons were using PR-specific cache as the baseline, masking cumulative performance degradation across multiple commits. A PR with commit 1 degrading by 5% and commit 2 degrading by another 5% would show as two separate 5% regressions instead of cumulative 10% degradation from main.

Changes

Cache Strategy

  • Before: Restore keys prioritized PR-specific cache, then fell back to main
  • After: Only restore from main branch baseline
  • All PR commits now compare against the same main baseline

Override Benchmark Threshold

  • Added override-benchmark-threshold PR label support
  • When present: benchmark runs and comments but doesn't fail PR
  • Enables intentional performance trade-offs with full visibility
  • Label check includes error handling with safe defaults (fails on error)
  • Only applies to PRs; main branch always fails on regression

Configuration

restore-keys: |
  ${{ format('{0}-benchmark-main-', runner.os) }}

fail-on-alert: ${{ github.event_name != 'pull_request' || (steps.check-label.outputs.has_label || 'false') != 'true' }}

Documentation

  • Updated benchmark README with comparison strategy and label usage
  • Added micro-benchmark section to main README
Original prompt

The cache key for benchmarks should always compare to the last successful benchmark in the main branch. Otherwise, we might see failures when committing several commits to a branch that still succeed when compared to the merged code. We should also have a way to accept degraded performance in CI.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits October 28, 2025 07:57
- Change benchmark cache restore keys to prioritize main branch
- Remove fallback to PR/branch-specific cache for comparisons
- Add support for accept-performance-degradation label to skip fail-on-alert
- Update documentation to explain new behavior
- Ensures cumulative performance regressions across multiple commits are detected

Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
- Add error handling for gh pr view command failures
- Default to fail-on-alert if label check fails
- Use OR operator with default value in fail-on-alert expression
- Addresses code review feedback

Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix benchmark cache key for CI comparisons fix(ci): benchmark comparisons always use main branch baseline Oct 28, 2025
Copilot finished work on behalf of LayZeeDK October 28, 2025 08:06
Copilot AI requested a review from LayZeeDK October 28, 2025 08:06
@LayZeeDK
Copy link
Member

@copilot Change the PR label to override-benchmark-threshold.

- Rename accept-performance-degradation to override-benchmark-threshold
- Update all references in workflow, README, and benchmark docs
- More descriptive name for bypassing benchmark failure

Co-authored-by: LayZeeDK <6364586+LayZeeDK@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Oct 28, 2025

@copilot Change the PR label to override-benchmark-threshold.

Done in 739baef. Renamed the label from accept-performance-degradation to override-benchmark-threshold across all files (workflow, README, and benchmark documentation).

Copilot finished work on behalf of LayZeeDK October 28, 2025 08:23
@LayZeeDK LayZeeDK marked this pull request as ready for review October 28, 2025 20:46
@LayZeeDK LayZeeDK changed the title fix(ci): benchmark comparisons always use main branch baseline vi: benchmark comparisons always use main branch baseline Oct 28, 2025
@LayZeeDK LayZeeDK changed the title vi: benchmark comparisons always use main branch baseline ci: benchmark comparisons always use main branch baseline Oct 28, 2025
@LayZeeDK LayZeeDK merged commit 64e2ffb into main Oct 28, 2025
13 checks passed
@LayZeeDK LayZeeDK deleted the copilot/fix-benchmark-cache-key branch October 28, 2025 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants