feat: preserve partial scheduling progress on context timeout instead of rolling back all work #4559

dejanzele · 2025-12-02T13:45:33Z

What type of PR is this?

Enhancement

What this PR does / why we need it

Previously, when the scheduler hit its timeout during the scheduling cycle, it would return an error and discard all work, even jobs that were successfully scheduled before the timeout.

This change implements the following approach: when a timeout occurs, stop considering new jobs but continue scheduling any evicted jobs that still need to be rescheduled.

The key change is in QueueScheduler.Schedule: instead of returning an error on context timeout, we call OnlyYieldEvicted() to switch the iterator to only yield evicted jobs, finish scheduling those, then return the partial results. The subsequent stages (oversubscription handling, optimiser, unbind) continue to run normally.

Expected output

When a timeout happens, we should see log lines like the ones below:

  INFO Timeout reached for pool default, switching to evicted-only mode
  INFO Scheduling cycle interrupted by context deadline exceeded: scheduled 873 jobs for pool default
  INFO Scheduled on executor pool default in 19.983083ms with error <nil>

How to test

Check the section at the end called Additional Files for the test script and test Armada job.

Configure scheduler with a short timeout in _local/scheduler/config.yaml:

maxSchedulingDuration: 200ms

Configure fake executor with enough capacity in _local/fakeexecutor/config.yaml:

nodes:
  - name: "fake-node"
    count: 50
    allocatable:
      cpu: "64"
      memory: "256Gi"

Start the local environment: goreman -f _local/procfiles/fake-executor.Procfile start
Create two test queues:

armadactl create queue queue-a
armadactl create queue queue-b

Run the following commands to generate jobs:

./scripts/submit-jobs.sh -c 5000 -q queue-a -j jobset-timeout-a example/fair-share-test.yaml
./scripts/submit-jobs.sh -c 5000 -q queue-b -j jobset-timeout-b example/fair-share-test.yaml

Assert that the following logs appear in the output of the scheduler:

  INFO Timeout reached for pool default, switching to evicted-only mode
  INFO Scheduling cycle interrupted by context deadline exceeded: scheduled 873 jobs for pool default
  INFO Scheduled on executor pool default in 19.983083ms with error <nil>

Additional Files

# scripts/submit-jobs.sh

#!/bin/bash
set -e

COUNT=1
JOBSET="test-jobset"
QUEUE="test-queue"
JOB_TEMPLATE=""
MAX_PARALLEL=50

while [[ $# -gt 0 ]]; do
    case $1 in
        -c|--count) COUNT="$2"; shift 2 ;;
        -j|--jobset) JOBSET="$2"; shift 2 ;;
        -q|--queue) QUEUE="$2"; shift 2 ;;
        -p|--parallel) MAX_PARALLEL="$2"; shift 2 ;;
        -*) echo "Unknown option $1"; exit 1 ;;
        *) JOB_TEMPLATE="$1"; shift ;;
    esac
done

[[ -z "$JOB_TEMPLATE" ]] && JOB_TEMPLATE="example/fair-share-test.yaml"
[[ ! -f "$JOB_TEMPLATE" ]] && echo "Error: $JOB_TEMPLATE not found" && exit 1

ARMADACTL="./armadactl"
[[ ! -f "$ARMADACTL" ]] && ARMADACTL="armadactl"

TEMP_DIR=$(mktemp -d)
trap "rm -rf $TEMP_DIR" EXIT

JOB_FILE="$TEMP_DIR/job.yaml"
sed -e "s/^jobSetId:.*/jobSetId: $JOBSET/" -e "s/^queue:.*/queue: $QUEUE/" "$JOB_TEMPLATE" > "$JOB_FILE"

$ARMADACTL create queue "$QUEUE" 2>/dev/null || true

echo "Submitting $COUNT batches to queue '$QUEUE' jobset '$JOBSET'..."

PIDS=()
for ((i=1; i<=COUNT; i++)); do
    $ARMADACTL submit "$JOB_FILE" >/dev/null 2>&1 &
    PIDS+=($!)
    if ((${#PIDS[@]} >= MAX_PARALLEL)) || ((i == COUNT)); then
        for pid in "${PIDS[@]}"; do wait $pid; done
        PIDS=()
        echo "Progress: $i/$COUNT"
    fi
done

echo "Done. Submitted $COUNT batches to queue '$QUEUE'"

# example/fair-share-test.yaml

queue: test-queue
jobSetId: fair-share-test
jobs:
  - namespace: default
    priority: 1000
    podSpec: &podspec
      terminationGracePeriodSeconds: 0
      restartPolicy: Never
      containers:
        - name: worker
          image: busybox:latest
          command: ["sleep", "3600"]
          resources:
            limits:
              memory: 64Mi
              cpu: 50m
            requests:
              memory: 64Mi
              cpu: 50m
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec
  - namespace: default
    priority: 1000
    podSpec: *podspec

d80tb7 · 2025-12-03T09:30:28Z

internal/scheduler/scheduling/queue_scheduler.go

+			sctx.TerminationReason = ctx.Err().Error()
 		default:
 		}
+		if ctx.Err() != nil {


Are you sure this is safe? The scheduler has multiple stages (evict, reschedule, remove overfill) and I'm not sure that a timeout at an arbitrary point will result in a usable set of schedulable jobs. The test below handles the simple case where
we are scheduling jobs onto an empty cluster with no preemption but I suspect other cases will fail:

Fair share prremption

Urgency based preemption (particularly in the case of overfill)

Good point, I'll work on covering them.

d80tb7

I'm not convinced this works in the case of premption.

… of rolling back all work Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>

nikola-jokic previously approved these changes Dec 2, 2025

View reviewed changes

d80tb7 reviewed Dec 3, 2025

View reviewed changes

d80tb7 requested changes Dec 3, 2025

View reviewed changes

dejanzele dismissed nikola-jokic’s stale review via 905be1e December 12, 2025 17:04

dejanzele force-pushed the feat/scheduler-graceful-shutdown branch 8 times, most recently from 1ff8fc9 to 27cd367 Compare December 16, 2025 07:22

feat: preserve partial scheduling progress on context timeout instead…

fa91955

… of rolling back all work Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>

dejanzele force-pushed the feat/scheduler-graceful-shutdown branch from 27cd367 to fa91955 Compare December 18, 2025 00:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: preserve partial scheduling progress on context timeout instead of rolling back all work #4559

feat: preserve partial scheduling progress on context timeout instead of rolling back all work #4559

Uh oh!

dejanzele commented Dec 2, 2025 •

edited

Loading

Uh oh!

d80tb7 Dec 3, 2025

Uh oh!

dejanzele Dec 3, 2025

Uh oh!

d80tb7 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: preserve partial scheduling progress on context timeout instead of rolling back all work #4559

Are you sure you want to change the base?

feat: preserve partial scheduling progress on context timeout instead of rolling back all work #4559

Uh oh!

Conversation

dejanzele commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it

Expected output

How to test

Additional Files

Uh oh!

d80tb7 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

dejanzele Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

d80tb7 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dejanzele commented Dec 2, 2025 •

edited

Loading