Skip to content

feat: implement node locking for NodeSet worker pods#130

Closed
giuliocalzo wants to merge 1 commit intoSlinkyProject:mainfrom
giuliocalzo:sync
Closed

feat: implement node locking for NodeSet worker pods#130
giuliocalzo wants to merge 1 commit intoSlinkyProject:mainfrom
giuliocalzo:sync

Conversation

@giuliocalzo
Copy link
Copy Markdown
Contributor

@giuliocalzo giuliocalzo commented Feb 24, 2026

Summary

Add lockNodes and lockNodeLifetime fields to NodeSetSpec to pin worker pods to their assigned Kubernetes nodes. When enabled, the controller records each pod-to-node mapping in NodeSetStatus and injects a requiredDuringSchedulingIgnoredDuringExecution NodeAffinity on pod recreation so each worker always returns to the same physical node.

The lockNodeLifetime field controls how long the lock persists: 0 means permanent, and a positive value (in seconds) causes the lock to expire after the pod stops running, allowing it to reschedule freely. Running pods continuously refresh their assignment timestamp so the countdown only begins once the pod is no longer active on the node.

Breaking Changes

none

Testing Notes

local testing with kind

status:
  nodeAssignments:
    "0":
      node: node-gpu-1
      at: 1740384000
    "1":
      node: node-gpu-2
      at: 1740384000

Additional Context

@vivian-hafener vivian-hafener self-assigned this Mar 2, 2026
@vivian-hafener vivian-hafener self-requested a review March 2, 2026 22:44
@giuliocalzo
Copy link
Copy Markdown
Contributor Author

good morning @vivian-hafener I rebase and adjust based on the last pre-commit checks, feel free to review it

@vivian-hafener vivian-hafener removed their request for review March 3, 2026 15:13
@vivian-hafener vivian-hafener removed their assignment Mar 3, 2026
@vivian-hafener
Copy link
Copy Markdown
Contributor

Good afternoon @giuliocalzo,

Was this resolved by 0ce9ef7?

Best regards,
Vivian Hafener

@SkylerMalinowski SkylerMalinowski self-requested a review March 9, 2026 21:38
@giuliocalzo
Copy link
Copy Markdown
Contributor Author

hi @vivian-hafener the deamonset implement natively the 1:1 node/pod affinity, but the statefulset does not have this feature, this PR is implementing this feature

@giuliocalzo
Copy link
Copy Markdown
Contributor Author

@SkylerMalinowski I've just rebased and sort some conflict

@SkylerMalinowski SkylerMalinowski self-assigned this Apr 3, 2026
Copy link
Copy Markdown
Contributor

@SkylerMalinowski SkylerMalinowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume LockNodes=true, LockNodeLifetime=0s, ScalingMode=StatefulSet, and pod-0 is assigned to node-0. If I were to node autoscale my Kubernetes cluster down such that node-0 is scaled-in, the node lock on node-0 would never expire and the scaling logic would create pod-0 with strict affinity for node-0 which no longer exists, hence will never run.

With respect to the NodeAssignments map, is it better to clear a node assignment where the Kube node is NotFound, or have the scaling logic not create a pod with strict affinity for a node that is NotFound? I'm thinking the former case is best to avoid defunct NodeAssignments ever increasing the map and potentially going over etcd entry limits, but the latter case would best respect LockNodeLifetime=0s.

// Only used when lockNodes is true.
// +optional
// +default:=0
LockNodeLifetime int32 `json:"lockNodeLifetime,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a Duration for rich value expressions.

Suggested change
LockNodeLifetime int32 `json:"lockNodeLifetime,omitempty"`
LockNodeLifetime metav1.Duration `json:"lockNodeLifetime,omitempty"`

Comment on lines +229 to +232
validOrdinals := make(map[string]struct{}, replicaCount)
for i := range replicaCount {
validOrdinals[strconv.Itoa(i)] = struct{}{}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. Unlike appsv1.StatefulSet, NodeSet does not strictly scale-in from highest ordinal to lowest in order. Gaps are allowed and frequent during scale-in.

Let's assume replicas=3 with an initial state of pod-{0,1,2}. Let's say you set replicas=2 and the NodeSet controller gracefully terminated pod-1. Your code would say that pod-2 is invalid despite being perfectly valid, and would be erroneously cleared from assignments map.

@SkylerMalinowski
Copy link
Copy Markdown
Contributor

This functionality was merged in 34f08a2 , but the implementation significantly deviated from your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants