Fix: Prevent PodBinding Controller Panic for Infinite NoExecute Tolerations#2501
Fix: Prevent PodBinding Controller Panic for Infinite NoExecute Tolerations#2501WHOIM1205 wants to merge 1 commit intoopenyurtio:masterfrom
Conversation
Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
|
|
hello @zyjhtangtang This PR fixes a deterministic panic in the PodBinding controller when handling valid Kubernetes pods with infinite (TolerationSeconds=nil) NoExecute tolerations on autonomous edge nodes. The issue is easy to trigger in real edge deployments and causes yurt-manager to crash repeatedly. The fix is minimal, defensive, and follows the same pattern already used in taint_manager.go, while fully preserving toleration semantics. I’ve added clear reproduction steps and impact analysis in the PR description. Would appreciate a review when you have time. Thanks! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2501 +/- ##
==========================================
- Coverage 44.08% 44.08% -0.01%
==========================================
Files 399 399
Lines 26560 26560
==========================================
- Hits 11710 11709 -1
+ Misses 13788 13785 -3
- Partials 1062 1066 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@WHOIM1205 Please focus on unit test coverage. |



Summary
This PR fixes a nil pointer dereference in the PodBinding controller that causes
yurt-managerto panic and restart when reconciling pods with infinite (TolerationSeconds=nil)NoExecutetolerations on autonomous edge nodes.Infinite
NoExecutetolerations are valid and common in Kubernetes. The controller previously assumedTolerationSecondswas always set and dereferenced it without a nil check.❗ Problem
In
reconcilePod, the controller stored original toleration values by directly dereferencingTolerationSeconds.When
TolerationSeconds == nil(meaning tolerate forever), this resulted in a runtime panic:This breaks node autonomy, a core OpenYurt feature, and can destabilize the entire control plane.
🔁 Steps to Reproduce
yurt-managerrunning.apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
nodeName: edge-node-1
tolerations:
operator: "Exists"
effect: "NoExecute"
TolerationSeconds intentionally omitted (nil)
containers:
image: nginx
kubectl logs -n kube-system deployment/yurt-manager | grep panic