It might be a good idea to move to externalTrafficPolicy: Local (instead of the default value Cluster).
Moving to externalTrafficPolicy: Local has pros and cons:
Pros: better health checks from LoadBalancer, less forwarding steps between nodes
Cons: uneven traffic distribution across pods
The problem with LoadBalancer health checks when using externalTrafficPolicy: Cluster is that it only checks if the kube-proxy is up and running, without considering if the Node status is Ready or not. For example, if there is a networking issue on the node, it may still appear as healthy in the kube-proxy /healthz endpoint (http://<node ip>:10256/healthz).
With externalTrafficPolicy: Local and using the LoadBalancer healthCheckNodePort, the health check (sent to kube-proxy /healthz) is more reliable: it tells if there is at least one pod (of the deployment) that is Ready on that node. This should also prevent the node from receiving traffic if there are networking issues (e.g. NetworkReady=false, error: cni plugin not initialized, etc.).
To mitigate the cons we should use a topologySpreadConstraints.
It might be a good idea to move to
externalTrafficPolicy: Local(instead of the default valueCluster).Moving to
externalTrafficPolicy: Localhas pros and cons:Pros: better health checks from LoadBalancer, less forwarding steps between nodes
Cons: uneven traffic distribution across pods
The problem with LoadBalancer health checks when using
externalTrafficPolicy: Clusteris that it only checks if the kube-proxy is up and running, without considering if the Node status is Ready or not. For example, if there is a networking issue on the node, it may still appear as healthy in the kube-proxy/healthzendpoint (http://<node ip>:10256/healthz).With
externalTrafficPolicy: Localand using the LoadBalancerhealthCheckNodePort, the health check (sent to kube-proxy/healthz) is more reliable: it tells if there is at least one pod (of the deployment) that is Ready on that node. This should also prevent the node from receiving traffic if there are networking issues (e.g.NetworkReady=false, error: cni plugin not initialized, etc.).To mitigate the cons we should use a
topologySpreadConstraints.