-
Couldn't load subscription status.
- Fork 43
Description
Background
Service graph metrics are used by observability tools to visualize the relation of service. The screenshot shows an example of a service graph in Open Source Grafana:
The data model behind service graphs are "Service graph metrics". Traditionally, these service graph metrics are generated from Spans, for example using the OpenTelemetry collector's Service Graph Connector.
Generating service graph metrics from Spans works well, but requires a sample rate of 100%. This may result in significant overhead for applications and for the OpenTelemetry collector.
To avoid this overhead, OBI has a feature named application_service_graph which allows OBI to export service graph metrics directly.
Problem
The two crucial attributes in service graph metrics are client_service_namespace, client containing the namespace and service name of the client, and server_service_namespace, server containing the namespace and service name of the server.
When OBI instruments a server, it can easily provide the server's name and namespace. However, there is no straightforward name how to add the client's name and namespace.
The current implementation uses Kubernetes API to look up the client's IP address and translate it to the deployment name. This works reasonably well (there is also a K8S cache for making this more efficient in large clusters), but this solution is limited to K8S only.
Proposal
The proposal is to inject the client's namespace, name, instance labels into each outgoing HTTP request. This can be achieved using the W3C Tracestate Header.
The main purpose of the
tracestateHTTP header is to provide additional vendor-specific trace identification information across different distributed tracing systems and is a companion header for thetraceparentfield. It also conveys information about the request’s position in multiple distributed tracing graphs.
The OBI instrumentation on the server side can then parse this header and use the client identifiers for service graph metrics.
For example, the header could look as follows:
tracestate: client.namespace=prod,client.name=frontend,client.instance.id=00f067aa0ba902b7
There would need to be an escaping mechanism for non ASCII characters.
The benefits of this approach:
- This works independent of the underlying platform (K8S, EC2, bare metal, ...)
- This works even if the client uses a non-default service name (for example if the
OTEL_SERVICE_NAMEenvironment variable is set on the client side) - OBI can already inject the
traceparentheader, so injecting an additionaltracestateshould be straightforward. - If this gets adopted by runtime-specific SDKs it will also work if one side of a connection is instrumented with OBI and the other side is instrumented with a runtime-specific SDK.
This is currently just a proposal to be discussed. So, what do you think?