Skip to content

[CI] macOS runner scarcity is delaying rolling builds for hours and leading to cancellations #15615

@radical

Description

@radical

Is there an existing issue for this?

I couldn't find one.

Describe the bug

Rolling builds on main are getting stretched out for hours because macos-latest jobs appear to be competing for a very limited runner pool. In some cases the workflow ends up getting cancelled after spending a large amount of time waiting for macOS capacity.

Example run:
https://github.com/microsoft/aspire/actions/runs/23576613463

A few concrete data points from that run:

  • The workflow started at 2026-03-26T03:51:39Z and was not fully updated until 2026-03-26T07:50:07Z.
  • There were 101 macOS jobs in the run.
  • The first macOS jobs did not start until about 51 minutes after the workflow started.
  • The median macOS job start delay was about 91 minutes.
  • 35 macOS jobs started more than 2 hours after the workflow began.
  • 10 macOS jobs started more than 3 hours after the workflow began.
  • The latest macOS job in this run (Tests / Templates-XUnit_V3MTP_NewUpAndBuildSupportProjectTemplatesTests / Templates-XUnit_V3MTP_NewUpAndBuildSupportProjectTemplatesTests (macos-latest)) did not start until about 233 minutes after the workflow started:
    https://github.com/microsoft/aspire/actions/runs/23576613463/job/68654796150

Even when the jobs eventually run, this level of queueing makes rolling validation much slower and seems to increase the odds that the overall workflow gets cancelled before everything is meaningfully complete.

Expected Behavior

Rolling builds should be able to start macOS jobs promptly enough that the workflow finishes in a predictable amount of time instead of spending hours waiting on runner availability.

Steps To Reproduce

  1. Open a rolling main workflow run with a large macOS matrix, such as:
    https://github.com/microsoft/aspire/actions/runs/23576613463
  2. Compare the workflow start time to the started_at timestamps of the macos-latest jobs.
  3. Observe that many macOS jobs do not begin for 1-4 hours, and the overall run eventually ends in cancelled.

Exceptions (if any)

No response

.NET Version info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-engineering-systemsinfrastructure helix infra engineering repo stuff

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions