Skip to content

AKS: omsagent-win pods restarts again and again #436

@chefcook

Description

@chefcook

omsagent-win is the pod in the kube-system namespace that is supplied with aks included if you have azure insights enabled. I use a hybrid environment here. Win & Linux are used.

output: kubectl get nodes

NAME                            STATUS   ROLES   AGE     VERSION
aks-nplin-21116150-vmss000002   Ready    agent   2d21h   v1.21.2
aks-nplin-21116150-vmss000003   Ready    agent   2d21h   v1.21.2
aks-nplin-21116150-vmss000006   Ready    agent   2d21h   v1.21.2
aksnpwin000003                  Ready    agent   2d20h   v1.21.2
aksnpwin000004                  Ready    agent   2d20h   v1.21.2
aksnpwin000005                  Ready    agent   2d20h   v1.21.2

On a linux node everything works fine.

NAME                                    READY   STATUS    RESTARTS   AGE     IP             NODE                            NOMINATED NODE   READINESS GATES
omsagent-xscbv                          2/2     Running   0          2d21h   10.240.1.108   aks-nplin-21116150-vmss000006   <none>           <none>
omsagent-k2zlx                          2/2     Running   0          2d21h   10.240.0.137   aks-nplin-21116150-vmss000002   <none>           <none>
omsagent-pzd4s                          2/2     Running   0          2d21h   10.240.0.79    aks-nplin-21116150-vmss000003   <none>           <none>

But as soon as it goes to a windows node I have a restart all the time. NodeSelector was also checked.

NAME                                    READY   STATUS    RESTARTS   AGE     IP             NODE                            NOMINATED NODE   READINESS GATES
omsagent-win-2vwqd                      1/1     Running   283        2d20h   10.240.2.64    aksnpwin000005                  <none>           <none>
omsagent-win-5kz2h                      1/1     Running   73         2d20h   10.240.1.178   aksnpwin000003                  <none>           <none>
omsagent-win-gmwk6                      1/1     Running   25         2d20h   10.240.1.46    aksnpwin000004                  <none>           <none>

output: kubectl -n kube-system describe pod omsagent-win-2vwqd

Events:
  Type     Reason     Age                    From     Message
  ----     ------     ----                   ----     -------
  Warning  Unhealthy  10m (x950 over 2d20h)  kubelet  Liveness probe failed:
  Normal   Killing    19s (x708 over 2d20h)  kubelet  Container omsagent-win failed liveness probe, will be restarted

I have already tried to give the pods more cpu and ram that worked at the beginning but after a while (about 30 minutes) they go back to their old original values.

Any ideas on how to examine this in a different way?

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions