Skip to content

Alert rules provided by the cos-agent relation generate false positives #96

@przemeklal

Description

@przemeklal

On my microk8s cluster consisting of 3 units, after relating microk8s charm to grafana-agent using both juju-info and cos-agent relation I ended up with 4 active alerts: KubeAPIDown, KubeControllerManagerDown, KubeletDown, KubeSchedulerDown.

The root cause seems to be that these alerts rely on the presence of juju_charm label which is missing in my env:

One of the alert rules:

absent(up{job="apiserver",juju_application="microk8s",juju_charm="grafana-agent",juju_model="microk8s",juju_model_uuid="57280f89-7c62-4703-8622-02de020641d2"} == 1)
count(up{job="apiserver",juju_application="microk8s",juju_charm="grafana-agent",juju_model="microk8s",juju_model_uuid="57280f89-7c62-4703-8622-02de020641d2"})

Empty query result

versus (without juju_charm label in the query the result is 3 as expected)

count(up{job="apiserver",juju_application="microk8s",juju_model="microk8s",juju_model_uuid="57280f89-7c62-4703-8622-02de020641d2"})

{} 3

The microk8s cluster itself is healthy, and all services are running.

Another alert-related problem I discovered is that client cert expiration alerts fire a bit too close to the actual expiration date. Entering the critical state only 24h before the expiration might be a bit challenging in real-life scenarios for the cluster administrator.

Versions:

  • juju 2.9.32
  • microk8s charm: latest/edge, rev 115
  • microk8s snap: v1.28.0 5788 1.28/stable
  • grafana-agent charm: rev 12, latest/candidate

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions