Skip to content

Service graph metrics must have unique identifiers #799

@fstab

Description

@fstab

Background

Service graph metrics are identified by the (service.name, service.namespace) tuples of the client and of the server.

If there are multiple instances of a client or a server, these instances are aggregated in to a single service graph time series.

In a traditional setup service graph metrics are generated from Spans. Metrics generation is performed by a central instance, like Tempo or a central OpenTelemetry collector. In that case, aggregating over instances is no problem, because the metrics generator has full access to Spans from all instances.

Problem

OBI's application_service_graph feature allows service graph metrics to be exposed directly, without the need to generate them from Spans.

However, that also means there is no central instance providing service graph metrics, each Beyla instance exposes metrics independently.

This is an issue if there is more than one instance of a client or a server on different hosts. In that case, the Beyla instances on the different hosts expose the same time series, identified by the same (service.name, service.namespace) tuples, and these time series will overwrite each other in the metrics backend.

Proposal

Add a unique Beyla instance identifier to service graph metrics so that service graph metrics provided by different Beyla instances cannot override each other.

The current service graph visualization in Grafana Tempo Explore uses this query:

sum by (client, server) (rate(traces_service_graph_request_server_seconds_sum[$__range]))

So adding a unique Beyla identifier should not harm, because the query aggregates it away.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions