Fix: Prevent Nil Pointer Panic When Node Zone Label Changes During Processing#2500
Fix: Prevent Nil Pointer Panic When Node Zone Label Changes During Processing#2500Yashika0724 wants to merge 1 commit intoopenyurtio:masterfrom
Conversation
Add defensive checks in markNodeForTainting and markNodeAsReachable to handle cases where a node's zone label changes during processing. Signed-off-by: Yashika0724 <ssyashika1311@gmail.com>
|
|
Hi @zyjhtangtang , this PR fixes a controller panic caused by zone label changes during retries by |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2500 +/- ##
==========================================
- Coverage 44.08% 44.06% -0.03%
==========================================
Files 399 399
Lines 26560 26571 +11
==========================================
- Hits 11710 11709 -1
- Misses 13788 13797 +9
- Partials 1062 1065 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@Yashika0724 Please focus on the blocking issues in the code detection. |
Thanks for pointing that out. |



Summary
This PR fixes a nil pointer panic in the Node Lifecycle Controller that can occur when a node’s
zone label changes between initial classification and retry-based reconciliation.
During retries, the controller may re-fetch the Node object from the API server with updated
topology labels, but the new zone may not be registered in the eviction queues. This results in
accessing a nil entry in
zoneNoExecuteTainter, which causes the controller to panic and crash.This fix adds defensive handling to safely register new zones when encountered and avoids invalid
map access.
Root Cause
Zones are registered only during initial node classification via
addPodEvictorForNewZone.However, in
updateNodeFunc, the controller workflow can be:In this case, the new zone is not present in
zoneNoExecuteTainter.When
markNodeForTaintingormarkNodeAsReachableexecutes:go nc.zoneNoExecuteTainter[nodetopology.GetZoneKey(node)]the map entry can be nil, leading to:
panic: runtime error: invalid memory address or nil pointer dereference
which crashes the entire Node Lifecycle Controller.
Steps to Reproduce
Alternative reproduction:
• Use a mutating webhook that modifies node zone labels
• Create node churn to increase reconciliation retries
Fix Applied
markNodeForTainting
• Extract zone key once at function start
• Check if the zone is registered
• Dynamically register missing zones using the same logic as initial classification
• Safely access eviction queues after registration
markNodeAsReachable
• Check if zone exists before removing node from eviction queue
• If zone is not registered, return safely with no-op
This prevents nil map access while preserving existing taint and eviction behavior.
Impact
Testing