Investigate performance impacts of collecting BPF metrics

The following CPU and memory profiles show that collecting eBPF metrics take a non-negligible amount of time in one of our clusters.

<img width="1689" height="637" alt="Image" src="https://github.com/user-attachments/assets/2deae48c-0750-47fb-ad9e-11238461b93f" />

<img width="1689" height="637" alt="Image" src="https://github.com/user-attachments/assets/0e8b41b9-399d-4f95-8731-276b0a3c3bbd" />

This needs to be further investigated. There are two obvious starting points:

 - the Prometheus reporter
 - `newProgramInfoFromFd` https://github.com/cilium/ebpf/blob/7c861fdc272cd85f23e01805c20512ce2f68219b/info.go#L350

I've noticed elsewhere that creating labels is not cheap - perhaps there's a way to cache that.

With regards to `newProgramInfoFromFd` - collecting BPF metrics involves creating a program ID iterator - which in itself involves inumerous syscalls per program, and on the cilium/ebpf side, some overhead into building the higher-level objects (including I/O).

The obvious workaround is to increase the polling time - but that is a _workaround_. We need to understand what exactly makes it take so long, and then what can be done / optimised.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Investigate performance impacts of collecting BPF metrics #727

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Investigate performance impacts of collecting BPF metrics #727

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions