feat(otel): add support for otel metrics api via protobuf and json #6783

mabdinur · 2025-10-29T17:11:46Z

What does this PR do?

Adds full OpenTelemetry Metrics support to dd-trace-js with a custom Meter Provider implementation. Enable with DD_METRICS_OTEL_ENABLED=true to export metrics via OTLP protocol.

Key Features:

Full OpenTelemetry Metrics API compliance with all standard instrument types (Counter, UpDownCounter, Histogram, Gauge, and Observable variants)
OTLP export via http/protobuf (default) or http/json protocols
Configurable endpoint, headers, timeout, and export intervals via standard OTEL_EXPORTER_OTLP_METRICS_* environment variables
Periodic metric collection and aggregation with support for DELTA, CUMULATIVE, and LOWMEMORY temporality modes
Comprehensive test coverage (91.69%) with 39 integration tests

Configuration:

DD_METRICS_OTEL_ENABLED - Enable OpenTelemetry metrics (default: false)
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT - Endpoint URL (default: http://localhost:4318/v1/metrics)
OTEL_EXPORTER_OTLP_METRICS_PROTOCOL - Protocol: http/protobuf or http/json
OTEL_METRIC_EXPORT_INTERVAL - Export interval in ms (default: 60000)
Additional timeout and header configuration options

Motivation

Enables customers to use OpenTelemetry Metrics API with dd-trace-js without adding the OpenTelemetry SDK as a dependency. Custom implementation provides better integration with dd-trace-js configurations, avoids vendoring grpc libraries and maintains flexibility.

Additional Notes

- Add metrics.proto and metrics_service.proto (OTLP v1 spec) - Update protobuf_loader to support metrics protos - Rename protos/ -> otlp/ directory for better organization

- Create OtlpHttpExporterBase for shared HTTP export logic - Create OtlpTransformerBase for shared transformation logic - Refactor logs exporter/transformer to extend base classes - Update test mocking paths - Eliminates ~400 lines of duplication

…cs-support

…e-classes

…cs-support

…rovider

…ics-configs

…rovider

…ics-configs

…rovider

…types

…rovider

BridgeAR · 2025-11-11T13:57:10Z

packages/dd-trace/src/opentelemetry/otlp/otlp_transformer_base.js

    for (const item of items) {
      const instrumentationScope = item.instrumentationScope || { name: '', version: '', schemaUrl: '', attributes: {} }
-      const attrsKey = stableStringify(instrumentationScope.attributes || {})
+      const attrsKey = JSON.stringify(instrumentationScope.attributes || {})


If attributes are an object we must use a stable stringify implementation. Otherwise the keys would be stringified in insertion order and it would not match anymore.

I reintroduced the stableStringify implementation:

function stableStringify (attributes) { if (attributes == null || typeof attributes !== 'object') { return JSON.stringify(attributes) } // Attributes are sorted by key to ensure consistent serialization regardless of key order. // Keys are always strings and values are always strings, numbers, booleans, // or arrays of strings, numbers, or booleans. return Object.keys(attributes) .sort() .map(key => `${key}:${JSON.stringify(attributes[key])}`) .join(',') }

We typically expect each metric to have 1–5 attributes. Since Datadog bills based on the number of unique metric + attribute combinations, adding more tags can dramatically increase metric cardinality and therefore costs. To prevent this, users will likely only set a few attributes per metric. So sorting attributes by key will likely add minimal overhead.

Also added a test case to catch this

I believe the attributes are already guaranteed to be an object, no?

Yup they should be. I can get rid of the additional check

packages/dd-trace/src/config.js

packages/dd-trace/src/opentelemetry/metrics/constants.js

packages/dd-trace/src/opentelemetry/metrics/periodic_metric_reader.js

BridgeAR

Almost LGTM :)

packages/dd-trace/src/opentelemetry/metrics/meter_provider.js

packages/dd-trace/src/opentelemetry/metrics/periodic_metric_reader.js

…operations

BridgeAR

LGTM with a few nits and potential follow-ups :)

BridgeAR · 2025-11-22T15:51:00Z

packages/dd-trace/test/opentelemetry/metrics.spec.js

+        const histogram = decoded.resourceMetrics[0].scopeMetrics[0].metrics[0]
+        assert.strictEqual(histogram.name, 'duration')
+        assert.strictEqual(histogram.histogram.dataPoints[0].count, 1)
+        assert.strictEqual(histogram.histogram.dataPoints[0].sum, 100)


Nit: I think checking for all properties of the histogram would be good.

Nit: Ideally, we use our assertObjectContains helper instead of having many individual assertions.

BridgeAR · 2025-11-22T15:55:41Z

packages/dd-trace/test/opentelemetry/metrics.spec.js

+      setTimeout(() => {
+        assert(validated, 'Should have validated an export with all metrics')
+        validator()
+        done()
+      }, 150)


Instead of using a timeout, it would be possible to just call done() from inside of the validator method. That way it is simpler and faster.

The same applies for all other methods with that pattern.

It is also more reliable, due to not being dependent on the timing anymore.

Or even better: the mockOtlpExport method returns a promise. That way you can trigger the methods sync and just return that promise afterwards. That would be very clean.

BridgeAR · 2025-11-22T16:02:12Z

packages/dd-trace/test/opentelemetry/metrics.spec.js

+        assert(counter, 'Should have counter')
+        assert(histogram, 'Should have histogram')
+        assert.strictEqual(counter.name, 'test')
+        assert.strictEqual(histogram.name, 'test')
+        assert.strictEqual(counter.sum.dataPoints.length, 1, 'Counter should have 1 data point')
+        assert.strictEqual(histogram.histogram.dataPoints.length, 1, 'Histogram should have 1 data point')
+        assert.strictEqual(counter.sum.dataPoints[0].asInt, 5)
+        assert.strictEqual(histogram.histogram.dataPoints[0].sum, 100)


I know we have lots of code like that, while I would like to start prevent calling so many asserts on a single object. We should just use the partial helper (assertObjectContains, if I am not mistaken)

BridgeAR · 2025-11-22T16:09:13Z

packages/dd-trace/src/opentelemetry/metrics/periodic_metric_reader.js

+    let bucketIndex = DEFAULT_HISTOGRAM_BUCKETS.length
+    for (let i = 0; i < DEFAULT_HISTOGRAM_BUCKETS.length; i++) {
+      if (value <= DEFAULT_HISTOGRAM_BUCKETS[i]) {
+        bucketIndex = i
+        break
+      }
+    }


Ideally, this would use a binary search.

BridgeAR · 2025-11-22T16:10:47Z

packages/dd-trace/src/opentelemetry/metrics/periodic_metric_reader.js

+    if (!cumulativeState.has(stateKey)) {
+      cumulativeState.set(stateKey, {
+        count: 0,
+        sum: 0,
+        min: Infinity,
+        max: -Infinity,
+        bucketCounts: new Array(DEFAULT_HISTOGRAM_BUCKETS.length + 1).fill(0),
+        startTime: metric.temporality === TEMPORALITY.CUMULATIVE ? this.#startTime : timestamp
+      })
+    }
+
+    const state = cumulativeState.get(stateKey)


Nit

Ideally, it would just get the state if it was defined. If it was not defined, we could combine the handling to just set the first value.

That would be a tad faster.

BridgeAR · 2025-11-22T16:15:50Z

packages/dd-trace/src/opentelemetry/metrics/periodic_metric_reader.js

+    const state = cumulativeState.get(stateKey)
+    state.value += value
+
+    const dataPoint = this.#findOrCreateDataPoint(metric, attributes, attrKey, () => ({


I believe the idea around the method is an performance optimization to prevent the object from being created?

Instead of an object, we are now creating a method that is referencing these parts. I did not benchmark something like that before, I would expect the function creation still to take more time though. The pattern is only useful in case there would be many compute heavy operations.

BridgeAR · 2025-11-22T16:20:31Z

packages/dd-trace/src/opentelemetry/metrics/periodic_metric_reader.js

+      sum: 0,
+      min: Infinity,
+      max: -Infinity,
+      bucketCounts: new Array(DEFAULT_HISTOGRAM_BUCKETS.length + 1).fill(0),


This is actually a compute heavy operations (referencing a comment above about findOrCreateDataPoint and the method vs object creation).

Making this lazy is a good idea. To prevent the overhead, I would still create a simple object and just use optional assignment to assign the bucketCounts after the call. That way dataPoint will be updated due to the reference and it is cheap.

@Private

…6783) * add otlp metrics protos * feat: add OTLP metrics proto definitions and reorganize directory - Add metrics.proto and metrics_service.proto (OTLP v1 spec) - Update protobuf_loader to support metrics protos - Rename protos/ -> otlp/ directory for better organization * refactor: extract common OTLP logic into base classes - Create OtlpHttpExporterBase for shared HTTP export logic - Create OtlpTransformerBase for shared transformation logic - Refactor logs exporter/transformer to extend base classes - Update test mocking paths - Eliminates ~400 lines of duplication * fix logs * increase test coverage * feat(metrics): add support for otel metrics provider * feat(metrics): add support for otlp configurations * updates to pass system tests * add temperolality support and clean up implementation * add support for encoding attributes * simplify tests * use real values in tests * do not encode numbers as strings * use enum in transformation * remove unneed fields * add better temporality support and include encoding for async metric types * improve test coverage for scope attributes * simplify stubs in tests * validate scope attributes * ruben comments * fix broken number test * clean up tests * clean up instruments and update tests * simplify meter * use constants and implement missing callbacks * move otlp_transformer changes over * clean up private/public fields * avoid redefining constant * clean up tests * update typing * linting clean ups * first round of changes from cr * limit number of metrics in each batch * round 3 changes * address more comments part 4 * update doc strings and typing * update max batch size to operate on the aggregate metrics * avoid converting aggregated metrics to arrays, perf * fix regression in logs implementation * revert closure * cleanup max measurement queue, jdocs and typing * log warning if value is invalid * final set of changes * update tests to be compatible with master branch * remove register from meter provider and simplify export * clean up stable stringify and update limit of max queue size * remove @Private from metrics docs * only implement the meterprovider api, renove shutdown and forceflush operations * remove observableInstruments * lint * fix linting failures * fix linting failures * fix linting 3

mabdinur and others added 30 commits October 14, 2025 12:27

add otlp metrics protos

a851ce1

feat: add OTLP metrics proto definitions and reorganize directory

181762c

- Add metrics.proto and metrics_service.proto (OTLP v1 spec) - Update protobuf_loader to support metrics protos - Rename protos/ -> otlp/ directory for better organization

Merge branch 'munir/otlp-refactor-base-classes' into munir/otlp-metri…

9787d56

…cs-support

fix logs

52a81ee

Merge branch 'munir/otlp-infrastructure' into munir/otlp-refactor-bas…

00a859b

…e-classes

increase test coverage

2144f7c

feat(metrics): add support for otel metrics provider

73b2127

feat(metrics): add support for otlp configurations

9a1b0b2

updates to pass system tests

5c9bd42

add temperolality support and clean up implementation

2267339

add support for encoding attributes

8ed35ff

Merge branch 'munir/otlp-refactor-base-classes' into munir/otlp-metri…

2e7062f

…cs-support

Merge branch 'munir/otlp-metrics-support' into munir/otlp-add-meter-p…

029155e

…rovider

Merge branch 'munir/otlp-add-meter-provider' into munir/add-otel-metr…

272ea27

…ics-configs

Merge branch 'master' into munir/otlp-metrics-support

f859b4b

simplify tests

3923982

use real values in tests

463baf1

Merge branch 'munir/otlp-metrics-support' into munir/otlp-add-meter-p…

11e76ac

…rovider

Merge branch 'munir/otlp-add-meter-provider' into munir/add-otel-metr…

e60a8a3

…ics-configs

do not encode numbers as strings

9841b8a

use enum in transformation

1dda3b4

remove unneed fields

8d7c684

Merge branch 'munir/otlp-metrics-support' into munir/otlp-add-meter-p…

fec1a7d

…rovider

add better temporality support and include encoding for async metric …

a0e2878

…types

improve test coverage for scope attributes

af5d2e1

Merge branch 'munir/otlp-metrics-support' into munir/otlp-add-meter-p…

c94d94e

…rovider

simplify stubs in tests

24241c6

validate scope attributes

d743d5c

ruben comments

c431bd7

BridgeAR reviewed Nov 11, 2025

View reviewed changes

packages/dd-trace/src/config.js Show resolved Hide resolved

BridgeAR reviewed Nov 11, 2025

View reviewed changes

packages/dd-trace/src/opentelemetry/metrics/constants.js Outdated Show resolved Hide resolved

BridgeAR reviewed Nov 11, 2025

View reviewed changes

packages/dd-trace/src/opentelemetry/metrics/periodic_metric_reader.js Show resolved Hide resolved

BridgeAR reviewed Nov 11, 2025

View reviewed changes

final set of changes

ba70476

mabdinur requested review from BridgeAR, IlyasShabi and simon-id November 14, 2025 16:16

mabdinur added 2 commits November 14, 2025 13:29

Merge branch 'master' into munir/add-otel-metrics-configs

47fcdbe

update tests to be compatible with master branch

7e0d0d6

BridgeAR reviewed Nov 17, 2025

View reviewed changes

packages/dd-trace/src/opentelemetry/metrics/meter_provider.js Outdated Show resolved Hide resolved

BridgeAR reviewed Nov 17, 2025

View reviewed changes

packages/dd-trace/src/opentelemetry/metrics/periodic_metric_reader.js Outdated Show resolved Hide resolved

packages/dd-trace/src/opentelemetry/metrics/periodic_metric_reader.js Show resolved Hide resolved

mabdinur added 3 commits November 17, 2025 09:35

remove register from meter provider and simplify export

067904c

clean up stable stringify and update limit of max queue size

ab2c771

remove @Private from metrics docs

c16f021

mabdinur requested a review from BridgeAR November 17, 2025 19:22

mabdinur added 2 commits November 18, 2025 07:12

only implement the meterprovider api, renove shutdown and forceflush …

dc2f61f

…operations

remove observableInstruments

ba3f4b1

mabdinur force-pushed the munir/add-otel-metrics-configs branch from 79e1be4 to ba3f4b1 Compare November 20, 2025 06:26

mabdinur added 4 commits November 20, 2025 01:40

lint

be4f8d9

fix linting failures

9219006

fix linting failures

ba8383b

fix linting 3

1b1af0e

mabdinur enabled auto-merge (squash) November 21, 2025 00:51

BridgeAR approved these changes Nov 22, 2025

View reviewed changes

mabdinur merged commit ef50f2e into master Nov 22, 2025
870 of 873 checks passed

mabdinur deleted the munir/add-otel-metrics-configs branch November 22, 2025 16:20

dd-octo-sts bot mentioned this pull request Nov 23, 2025

v5.81.0 proposal #6964

Draft

feat(otel): add support for otel metrics api via protobuf and json #6783

feat(otel): add support for otel metrics api via protobuf and json #6783

Uh oh!

Conversation

mabdinur commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Additional Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BridgeAR left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BridgeAR left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mabdinur commented Oct 29, 2025 •

edited

Loading