-
Couldn't load subscription status.
- Fork 43
Description
The following CPU and memory profiles show that collecting eBPF metrics take a non-negligible amount of time in one of our clusters.
This needs to be further investigated. There are two obvious starting points:
- the Prometheus reporter
newProgramInfoFromFdhttps://github.com/cilium/ebpf/blob/7c861fdc272cd85f23e01805c20512ce2f68219b/info.go#L350
I've noticed elsewhere that creating labels is not cheap - perhaps there's a way to cache that.
With regards to newProgramInfoFromFd - collecting BPF metrics involves creating a program ID iterator - which in itself involves inumerous syscalls per program, and on the cilium/ebpf side, some overhead into building the higher-level objects (including I/O).
The obvious workaround is to increase the polling time - but that is a workaround. We need to understand what exactly makes it take so long, and then what can be done / optimised.