-
Notifications
You must be signed in to change notification settings - Fork 358
feat(otel): add support for otel metrics api via protobuf and json #6783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add metrics.proto and metrics_service.proto (OTLP v1 spec) - Update protobuf_loader to support metrics protos - Rename protos/ -> otlp/ directory for better organization
- Create OtlpHttpExporterBase for shared HTTP export logic - Create OtlpTransformerBase for shared transformation logic - Refactor logs exporter/transformer to extend base classes - Update test mocking paths - Eliminates ~400 lines of duplication
| for (const item of items) { | ||
| const instrumentationScope = item.instrumentationScope || { name: '', version: '', schemaUrl: '', attributes: {} } | ||
| const attrsKey = stableStringify(instrumentationScope.attributes || {}) | ||
| const attrsKey = JSON.stringify(instrumentationScope.attributes || {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If attributes are an object we must use a stable stringify implementation. Otherwise the keys would be stringified in insertion order and it would not match anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reintroduced the stableStringify implementation:
function stableStringify (attributes) {
if (attributes == null || typeof attributes !== 'object') {
return JSON.stringify(attributes)
}
// Attributes are sorted by key to ensure consistent serialization regardless of key order.
// Keys are always strings and values are always strings, numbers, booleans,
// or arrays of strings, numbers, or booleans.
return Object.keys(attributes)
.sort()
.map(key => `${key}:${JSON.stringify(attributes[key])}`)
.join(',')
}
We typically expect each metric to have 1–5 attributes. Since Datadog bills based on the number of unique metric + attribute combinations, adding more tags can dramatically increase metric cardinality and therefore costs. To prevent this, users will likely only set a few attributes per metric. So sorting attributes by key will likely add minimal overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added a test case to catch this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the attributes are already guaranteed to be an object, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup they should be. I can get rid of the additional check
BridgeAR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM :)
packages/dd-trace/src/opentelemetry/metrics/periodic_metric_reader.js
Outdated
Show resolved
Hide resolved
79e1be4 to
ba3f4b1
Compare
BridgeAR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few nits and potential follow-ups :)
| const histogram = decoded.resourceMetrics[0].scopeMetrics[0].metrics[0] | ||
| assert.strictEqual(histogram.name, 'duration') | ||
| assert.strictEqual(histogram.histogram.dataPoints[0].count, 1) | ||
| assert.strictEqual(histogram.histogram.dataPoints[0].sum, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think checking for all properties of the histogram would be good.
Nit: Ideally, we use our assertObjectContains helper instead of having many individual assertions.
| setTimeout(() => { | ||
| assert(validated, 'Should have validated an export with all metrics') | ||
| validator() | ||
| done() | ||
| }, 150) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using a timeout, it would be possible to just call done() from inside of the validator method. That way it is simpler and faster.
The same applies for all other methods with that pattern.
It is also more reliable, due to not being dependent on the timing anymore.
Or even better: the mockOtlpExport method returns a promise. That way you can trigger the methods sync and just return that promise afterwards. That would be very clean.
| assert(counter, 'Should have counter') | ||
| assert(histogram, 'Should have histogram') | ||
| assert.strictEqual(counter.name, 'test') | ||
| assert.strictEqual(histogram.name, 'test') | ||
| assert.strictEqual(counter.sum.dataPoints.length, 1, 'Counter should have 1 data point') | ||
| assert.strictEqual(histogram.histogram.dataPoints.length, 1, 'Histogram should have 1 data point') | ||
| assert.strictEqual(counter.sum.dataPoints[0].asInt, 5) | ||
| assert.strictEqual(histogram.histogram.dataPoints[0].sum, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we have lots of code like that, while I would like to start prevent calling so many asserts on a single object. We should just use the partial helper (assertObjectContains, if I am not mistaken)
| let bucketIndex = DEFAULT_HISTOGRAM_BUCKETS.length | ||
| for (let i = 0; i < DEFAULT_HISTOGRAM_BUCKETS.length; i++) { | ||
| if (value <= DEFAULT_HISTOGRAM_BUCKETS[i]) { | ||
| bucketIndex = i | ||
| break | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, this would use a binary search.
| if (!cumulativeState.has(stateKey)) { | ||
| cumulativeState.set(stateKey, { | ||
| count: 0, | ||
| sum: 0, | ||
| min: Infinity, | ||
| max: -Infinity, | ||
| bucketCounts: new Array(DEFAULT_HISTOGRAM_BUCKETS.length + 1).fill(0), | ||
| startTime: metric.temporality === TEMPORALITY.CUMULATIVE ? this.#startTime : timestamp | ||
| }) | ||
| } | ||
|
|
||
| const state = cumulativeState.get(stateKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
Ideally, it would just get the state if it was defined. If it was not defined, we could combine the handling to just set the first value.
That would be a tad faster.
| const state = cumulativeState.get(stateKey) | ||
| state.value += value | ||
|
|
||
| const dataPoint = this.#findOrCreateDataPoint(metric, attributes, attrKey, () => ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the idea around the method is an performance optimization to prevent the object from being created?
Instead of an object, we are now creating a method that is referencing these parts. I did not benchmark something like that before, I would expect the function creation still to take more time though. The pattern is only useful in case there would be many compute heavy operations.
| sum: 0, | ||
| min: Infinity, | ||
| max: -Infinity, | ||
| bucketCounts: new Array(DEFAULT_HISTOGRAM_BUCKETS.length + 1).fill(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a compute heavy operations (referencing a comment above about findOrCreateDataPoint and the method vs object creation).
Making this lazy is a good idea. To prevent the overhead, I would still create a simple object and just use optional assignment to assign the bucketCounts after the call. That way dataPoint will be updated due to the reference and it is cheap.
…6783) * add otlp metrics protos * feat: add OTLP metrics proto definitions and reorganize directory - Add metrics.proto and metrics_service.proto (OTLP v1 spec) - Update protobuf_loader to support metrics protos - Rename protos/ -> otlp/ directory for better organization * refactor: extract common OTLP logic into base classes - Create OtlpHttpExporterBase for shared HTTP export logic - Create OtlpTransformerBase for shared transformation logic - Refactor logs exporter/transformer to extend base classes - Update test mocking paths - Eliminates ~400 lines of duplication * fix logs * increase test coverage * feat(metrics): add support for otel metrics provider * feat(metrics): add support for otlp configurations * updates to pass system tests * add temperolality support and clean up implementation * add support for encoding attributes * simplify tests * use real values in tests * do not encode numbers as strings * use enum in transformation * remove unneed fields * add better temporality support and include encoding for async metric types * improve test coverage for scope attributes * simplify stubs in tests * validate scope attributes * ruben comments * fix broken number test * clean up tests * clean up instruments and update tests * simplify meter * use constants and implement missing callbacks * move otlp_transformer changes over * clean up private/public fields * avoid redefining constant * clean up tests * update typing * linting clean ups * first round of changes from cr * limit number of metrics in each batch * round 3 changes * address more comments part 4 * update doc strings and typing * update max batch size to operate on the aggregate metrics * avoid converting aggregated metrics to arrays, perf * fix regression in logs implementation * revert closure * cleanup max measurement queue, jdocs and typing * log warning if value is invalid * final set of changes * update tests to be compatible with master branch * remove register from meter provider and simplify export * clean up stable stringify and update limit of max queue size * remove @Private from metrics docs * only implement the meterprovider api, renove shutdown and forceflush operations * remove observableInstruments * lint * fix linting failures * fix linting failures * fix linting 3
What does this PR do?
Adds full OpenTelemetry Metrics support to dd-trace-js with a custom Meter Provider implementation. Enable with
DD_METRICS_OTEL_ENABLED=trueto export metrics via OTLP protocol.Key Features:
http/protobuf(default) orhttp/jsonprotocolsOTEL_EXPORTER_OTLP_METRICS_*environment variablesConfiguration:
DD_METRICS_OTEL_ENABLED- Enable OpenTelemetry metrics (default:false)OTEL_EXPORTER_OTLP_METRICS_ENDPOINT- Endpoint URL (default:http://localhost:4318/v1/metrics)OTEL_EXPORTER_OTLP_METRICS_PROTOCOL- Protocol:http/protobuforhttp/jsonOTEL_METRIC_EXPORT_INTERVAL- Export interval in ms (default:60000)Motivation
Enables customers to use OpenTelemetry Metrics API with dd-trace-js without adding the OpenTelemetry SDK as a dependency. Custom implementation provides better integration with dd-trace-js configurations, avoids vendoring grpc libraries and maintains flexibility.
Additional Notes