Alert rules provided by the cos-agent relation generate false positives

On my microk8s cluster consisting of 3 units, after relating microk8s charm to grafana-agent using both juju-info and cos-agent relation I ended up with 4 active alerts: KubeAPIDown, KubeControllerManagerDown, KubeletDown, KubeSchedulerDown.

The root cause seems to be that these alerts rely on the presence of `juju_charm` label which is missing in my env:

One of the alert rules:
```
absent(up{job="apiserver",juju_application="microk8s",juju_charm="grafana-agent",juju_model="microk8s",juju_model_uuid="57280f89-7c62-4703-8622-02de020641d2"} == 1)
```

```
count(up{job="apiserver",juju_application="microk8s",juju_charm="grafana-agent",juju_model="microk8s",juju_model_uuid="57280f89-7c62-4703-8622-02de020641d2"})

Empty query result
```

versus (without `juju_charm` label in the query the result is 3 as expected)
```
count(up{job="apiserver",juju_application="microk8s",juju_model="microk8s",juju_model_uuid="57280f89-7c62-4703-8622-02de020641d2"})

{} 3
```

The microk8s cluster itself is healthy, and all services are running.

Another alert-related problem I discovered is that client cert expiration alerts fire a bit too close to the actual expiration date. Entering the critical state only 24h before the expiration might be a bit challenging in real-life scenarios for the cluster administrator.

Versions:
- juju 2.9.32
- microk8s charm: latest/edge, rev 115
- microk8s snap: v1.28.0 5788 1.28/stable
- grafana-agent charm: rev 12, latest/candidate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alert rules provided by the cos-agent relation generate false positives #96

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Alert rules provided by the cos-agent relation generate false positives #96

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions