Implement runner de-registration handling in RunnerProvisioner #51

owenmartin-toast · 2026-01-09T17:50:15Z

This change introduces a mechanism to ensure that runners are properly de-registered from GitHub before their associated VMs are deleted. A new method, ensureRunnerDeregistered, has been added to wait for the de-registration process, with a timeout and polling interval defined. If the runner does not de-register in time, it will be force-deleted. This enhances the reliability of the VM deletion process, and most importantly prevents a scenario where a GitHub Scale Set believes it has an active runner, and therefore will not provision any new ones for incoming requests

ispasov

Thank you for your contribution.
I have made a couple of comments. Let me know what you think.

Also, I have tested the proposed change and it is working great.

ispasov · 2026-01-19T12:19:18Z

pkg/runner-provisioner/provisioner.go

 )

+const (
+	runnerDeregistrationTimeout      = 30 * time.Second


I would suggest making this part of the env config so that they can be easily changed.

ispasov · 2026-01-19T12:20:51Z

pkg/runner-provisioner/provisioner.go


 func (p *RunnerProvisioner) deleteVM(ctx context.Context, runnerName string) {
+	// Wait for runner to de-register from GitHub before deleting VM
+	p.ensureRunnerDeregistered(ctx, runnerName)


Why before deleting a VM?

Can we delete the VM so we can free up space for other runners, before we wait for the runner deregistration?

It is pretty much fire-and-forget anyways.

Also, the comment is not needed. Generally we add comments only to explain "why" if something is not obvious. Here we can see that we are waiting for the runner to be deregistered before we delete the VM.

Deleting before the de-registration happens should be fine, but what we've seen (and what the original fix is intended to resolve) is that sometimes the VMs were being removed before the de-registration process was even firing, which left those 'ghost' instances.
But, with the eventual-cleanup that's now happening, we're probably fine to delete the VM, rely on "normal" de-registration to happen in the majority of cases, and rely on the eventual-cleanup to do a force-de-registration in the cases when it doesn't happen via the VM.
I guess two things need answering:

I have defaulted to delete-first-then-wait, but added configuration to allow people to switch to a wait-then-delete flow if they prefer - are you good with that?

Do we want to simplify this, and always delete then always try to deregister from this code, regardless of whether the de-registration succeeded when invoked from the VM?

I would simplify it - delete the VM first and try to deregister.
The reason is that a VM takes an actual quota from the cluster So there might be another one waiting for this to be deleted.
Why the runner does not take any quota.

Cool, I've done that, and the other comments have been addressed too. Thanks for your speedy feedback!

ispasov · 2026-01-19T12:28:24Z

pkg/runner-provisioner/provisioner.go

+
+	deadline := time.Now().Add(runnerDeregistrationTimeout)
+	for time.Now().Before(deadline) {
+		runner, err := p.actionsClient.GetRunner(ctx, runnerName)


We should be checking if the context is cancelled here before.

A more idiomatic way would be to rewrite this to use a Ticker and a select statement.
Something like:

for { select { case <-ctx.Done(): .... case <-ticker.C: ....

This would also handle the context check. You can of course check it with the current implementation as well.

Add functionality to ensure GitHub Actions runners are properly deregistered, preventing ghost runners from remaining in GitHub after VM deletion. Approach: - Delete VM first to free infrastructure resources immediately - Then ensure runner deregistration (no-op if already deregistered) - Force-delete runners that don't deregister within timeout Features: - Configurable deregistration timeout (default: 30s) - Configurable polling interval for status checks (default: 2s) - Idiomatic Go implementation using context, Ticker, and select patterns - Proper cleanup with defer statements for resource management Configuration: - RUNNER_DEREGISTRATION_TIMEOUT: Maximum time to wait for runner deregistration - RUNNER_DEREGISTRATION_POLL_INTERVAL: Interval between status checks Implementation details: - Use context.WithTimeout() for timeout management - Use time.NewTicker() for polling intervals - Use select statement for clean control flow - Support context cancellation for graceful shutdown Testing: - Add comprehensive test suite for RunnerProvisioner - Test runner deregistration success scenarios - Test force-deletion on timeout - Test context cancellation handling - Test transient error handling Co-Authored-By: Claude <noreply@anthropic.com>

ispasov

Looks good.

Thank you for your contribution.

My final suggestion would be to remove all inline comments (that are not func doc comments).
They do not really bring any value as they do not say why sth is done, but what is done. And this is already visible from the code

owenmartin-toast · 2026-01-21T16:34:38Z

I removed the inline comments per your suggestion, thanks!

owenmartin-toast requested a review from a team as a code owner January 9, 2026 17:50

owenmartin-toast mentioned this pull request Jan 9, 2026

Intermittent Issues with Connector Hanging #50

Open

owenmartin-toast force-pushed the fix-deregistration-bug branch from ace558f to 3b1583c Compare January 12, 2026 18:36

ispasov reviewed Jan 19, 2026

View reviewed changes

owenmartin-toast force-pushed the fix-deregistration-bug branch from 3b1583c to 2bf0ddd Compare January 19, 2026 20:44

owenmartin-toast force-pushed the fix-deregistration-bug branch from 2bf0ddd to fc5b243 Compare January 20, 2026 23:23

ispasov approved these changes Jan 21, 2026

View reviewed changes

Remove comments

eb28634

ispasov merged commit 7836b01 into macstadium:main Jan 22, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement runner de-registration handling in RunnerProvisioner #51

Implement runner de-registration handling in RunnerProvisioner #51

Uh oh!

owenmartin-toast commented Jan 9, 2026

Uh oh!

ispasov left a comment

Uh oh!

ispasov Jan 19, 2026

Uh oh!

ispasov Jan 19, 2026

Uh oh!

owenmartin-toast Jan 19, 2026

Uh oh!

ispasov Jan 20, 2026

Uh oh!

owenmartin-toast Jan 21, 2026

Uh oh!

ispasov Jan 19, 2026

Uh oh!

ispasov left a comment

Uh oh!

owenmartin-toast commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement runner de-registration handling in RunnerProvisioner #51

Implement runner de-registration handling in RunnerProvisioner #51

Uh oh!

Conversation

owenmartin-toast commented Jan 9, 2026

Uh oh!

ispasov left a comment

Choose a reason for hiding this comment

Uh oh!

ispasov Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

ispasov Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

owenmartin-toast Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

ispasov Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

owenmartin-toast Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

ispasov Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

ispasov left a comment

Choose a reason for hiding this comment

Uh oh!

owenmartin-toast commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants