Skip to content

fix: guard SLURM start-time polling behind a feature flag#469

Merged
ko3n1g merged 1 commit intomainfrom
ko3n1g/fix/slurm-poll-start-time-feature-flag
Mar 20, 2026
Merged

fix: guard SLURM start-time polling behind a feature flag#469
ko3n1g merged 1 commit intomainfrom
ko3n1g/fix/slurm-poll-start-time-feature-flag

Conversation

@ko3n1g
Copy link
Contributor

@ko3n1g ko3n1g commented Mar 20, 2026

Summary

  • Adds poll_estimated_start_time: bool = True to SlurmExecutor to expose the polling feature as an opt-out flag
  • The scheduler's schedule() method now checks the flag before starting the background thread introduced in feat: poll and print SLURM job estimated start time while pending #464
  • Adds a test (test_schedule_skips_polling_thread_when_disabled) verifying the thread is not started when the flag is False

Background

Commit f68f6f2 introduced a background daemon thread that polls squeue --start while a SLURM job is pending and prints its estimated start time. Some users reported issues with this behaviour. The fix keeps the feature enabled by default but lets affected users disable it via SlurmExecutor(poll_estimated_start_time=False).

Usage

import nemo_run as run

# Default: polling enabled — prints estimated start time while job is pending
executor = run.SlurmExecutor(
    account="my_account",
    partition="batch",
    nodes=1,
    ntasks_per_node=8,
    tunnel=run.SSHTunnel(host="cluster.example.com", user="me", job_dir="/scratch/jobs"),
)

# Opt out: disable the background polling thread
executor = run.SlurmExecutor(
    account="my_account",
    partition="batch",
    nodes=1,
    ntasks_per_node=8,
    tunnel=run.SSHTunnel(host="cluster.example.com", user="me", job_dir="/scratch/jobs"),
    poll_estimated_start_time=False,
)

Test plan

  • All existing start-time polling tests pass (`test_schedule_starts_start_time_polling_thread`, `test_schedule_stops_existing_thread_on_duplicate_job_id`, `test_close_stops_all_polling_threads`, `test_cancel_stops_polling_thread_for_job`)
  • New test `test_schedule_skips_polling_thread_when_disabled` passes
  • `uv run -- pytest test/run/torchx_backend/schedulers/test_slurm.py` — 34 passed

🤖 Generated with Claude Code

Add `poll_estimated_start_time: bool = True` to `SlurmExecutor` so
users who experience issues with the background polling thread introduced
in f68f6f2 can opt out by setting the flag to False, while keeping the
feature enabled by default for everyone else.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@ko3n1g ko3n1g merged commit 0ba375f into main Mar 20, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants