K8s namespace and deployment type information lost when previous pod exits

I'm observing a somewhat similar situation to https://github.com/grafana/beyla/issues/1228 in a test opentelemetry-demo deployment + Beyla. 

Rough timeline:

- Beyla starts (privileged container)
- otel demo starts, amongst other things a `valkey-cart` container
- initially traces from this container result in traces with `service.name == "otel-demo/Deployment/valkey-cart"`
- after quite a long time (I believe around 1-2 days) of uninterrupted running from both processes, Beyla appeared to "lose" the tagging information, and spans were sent with `service.name == "valkey-cart"`
- this persisted until the otel-demo deployment was fully restarted.

Extra context is that this test deployment was a little "troubled" initially, so valkey-cart was crash-looping and was redeployed multiple times. In Beyla logs, I can see this situation (limited to valkey-cart PIDs for clarity):

- PID 786043 instrumented: Sep 16 at 11:54:34 <- stable valkey pid, tracked for 2 days
- PID 825135 ended:        Sep 16 at 15:49:28 <- first "lingering" stuck pod dies
- PID 912010 ended:        Sep 17 at 02:05:51
- PID 913843 ended:        Sep 17 at 02:17:19
- PID 1000051 ended:       Sep 17 at 11:02:35
- PID 786043 ended:        Sep 18 at 12:54:28 <- full otel demo restart

Tracking this change, it appears that the "cutoff" happens at exactly 15:49 on Sep 16th, coinciding with the first "lingering" PID seen as ended:

<img width="1290" height="343" alt="Image" src="https://github.com/user-attachments/assets/dd045386-415d-471a-a16d-0a7a55daaa64" />

From a (rather uneducated) reading of the code, this appears to be a suspect:

https://github.com/grafana/opentelemetry-ebpf-instrumentation/blob/df9d1603b2bf061d059e2c213025cfdfde92fd23/pkg/components/kube/store.go#L229-L239

Reading previous pull requests, it appears that a similar bit of code was previously removed to fix a bug:

https://github.com/grafana/beyla/pull/1156/files

Perhaps reintroducing this behavior is a regression?

Please let me know if I can provide any more information that would be helpful. Sudden service name changes are quite detrimental to building e.g. meaningful alerting on top of them without workarounds. 🙏 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

K8s namespace and deployment type information lost when previous pod exits #656

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

K8s namespace and deployment type information lost when previous pod exits #656

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions