feat(opentelemetry sink): add native metric to OTLP conversion#24897
feat(opentelemetry sink): add native metric to OTLP conversion#24897szibis wants to merge 31 commits intovectordotdev:masterfrom
Conversation
Add encoding support for converting native Vector metrics to OTLP protobuf format (ExportMetricsServiceRequest). This enables Vector metrics from any source to be sent through OTLP sinks without requiring pre-formatted OTLP structure. Metric type mapping: - Counter → Sum (monotonic, Delta/Cumulative per MetricKind) - Gauge → Gauge - AggregatedHistogram → Histogram - AggregatedSummary → Summary - Distribution → Histogram (samples → buckets) - Set → Gauge (cardinality count) - Sketch → Gauge with warning (unsupported) Tag decomposition reverses the decode-path flattening: - resource.* tags → resource attributes - scope.name/version → InstrumentationScope - scope.* tags → scope attributes - All other tags → data point attributes Includes 36 tests covering type conversions, tag decomposition, roundtrip fidelity, and edge cases.
Update opentelemetry sink CUE requirements to reflect native metric auto-conversion support. Add comprehensive otlp-native-conversion docs with metric type mapping, tag decomposition reference, and use case examples. Add kvlist to spelling expect list.
This feature targets the 0.55.0 release, not 0.54.0.
|
Created DOCS-13660 for documentation team review |
… encode Apply review feedback patterns from PR vectordotdev#24621: - Replace `as u64` with `u64::try_from().ok()` for timestamp conversion - Replace `as u64`/`as f64` with `u64::from()`/`f64::from()` for sample.rate - Remove unwrap() in Distribution bucket overflow guard, use saturating index clamping instead
…ogram conversion NaN and Infinity sample values were not excluded from the accumulation loop in Distribution-to-Histogram conversion, even though they were already filtered from the boundary list. This caused NaN to poison the total_sum and misroute samples via binary_search_by. Non-finite samples are now skipped in the accumulation loop, matching the boundary filter.
Sketch metrics were converted to an empty gauge (value 0.0), causing silent data corruption downstream. Now Sketch metrics produce a Metric with no data field, which OTLP receivers will ignore. Also fixes the rate-limit key to use internal_log_rate_limit (Vector convention).
Add clippy::cast_precision_loss annotation on values.len() as f64 in Set conversion (usize→f64 is lossy for very large sets but acceptable). Fix test to use u64::try_from instead of bare 'as u64' cast, matching the production code pattern.
Add tests covering NaN/Infinity distribution samples, non-finite gauge and counter values, zero-rate samples, Sketch metric dropping, tag prefix collision routing, scope custom attributes, and non-finite histogram/summary sums.
Sketch metrics are now dropped with a warning instead of emitting an empty gauge, matching the code change.
|
Companion decode-side fix: #24905 adds decode for missing |
… tags Update decompose_metric_tags to handle 4 special tags as proto-level structural fields rather than generic attributes: - resource.dropped_attributes_count → Resource.dropped_attributes_count - resource.schema_url → ResourceMetrics.schema_url - scope.dropped_attributes_count → InstrumentationScope.dropped_attributes_count - scope.schema_url → ScopeMetrics.schema_url This ensures round-trip fidelity with fix/otlp-decode-missing-fields (vectordotdev#24905) once merged, while remaining backward-compatible (graceful defaults of 0 / empty string) before that PR merges.
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
…nversion - Use DataType::all_bits() instead of manual OR (reviewer nit) - Split OtlpSerializer doc into "Pre-formatted OTLP events" and "Native Vector events" sections for clarity - Sketch metrics now truly dropped (return None from native_metric_to_otlp_request instead of including empty metric) - Document resource.*/scope.* tag prefix reservation for OTLP mapping - Add sample config to changelog fragment - Remove orphaned docs/examples/otlp-native-conversion.md (content folded into encoder docstring)
Add explicit user-facing documentation about resource.*/scope.* tag prefix reservation in OTLP encoding: - CUE sink docs: new "Metric tag prefix conventions" section - Changelog: explicitly notes prefix reservation and behavior for non-OTLP sources using these prefixes coincidentally Addresses review feedback requesting explicit documentation for the implicit prefix reservation behavior.
…aming PR24905 changed metadata tags from dot-separated (resource.schema_url) to underscore-separated (resource_schema_url) to avoid colliding with user-supplied resource/scope attributes. Update the encode path to match: decompose_metric_tags now recognizes the underscore tag names.
Add edge_case_tests module to metrics.rs with 10 tests covering: - ExponentialHistogram decode path (scale=0 with positive/negative/zero buckets) - Counter with negative value roundtrip - Histogram with empty explicit_bounds (single +inf bucket) - Summary with no quantiles - Gauge special float values (NaN, Inf, -Inf, subnormal) - Multiple data points per metric (flattened to separate events) - Empty metric name - All aggregation temporalities (UNSPECIFIED, DELTA, CUMULATIVE) - start_time_unix_nano independence from metric timestamp - u64::MAX timestamp overflow safety
Local Integration & Edge Case TestingSet up a local test pipeline — Vector OTLP source on ports 4317/4318, piped through an OTLP sink to a second Vector instance on 4319/4320 for full roundtrip testing. Used grpcurl with proto files from Added 10 tests to PerformancePushed 10k metric events in batches of 100 — ~5.2k events/s, zero errors. These numbers mostly reflect grpcurl client overhead — a compiled gRPC client would be significantly faster. The important takeaway is stability: no crashes, no data corruption, no memory issues under sustained load. |
|
Note on testing methodology: For the local integration and performance tests, all 3 PRs (#24905, #24621, #24897) were merged locally into a single test branch ( |
…ations - Fix 4 metadata tag names in CUE docs to match implementation: resource.dropped_attributes_count → resource_dropped_attributes_count resource.schema_url → resource_schema_url scope.dropped_attributes_count → scope_dropped_attributes_count scope.schema_url → scope_schema_url - Document known metric roundtrip limitations: attribute type stringification and start_time_unix_nano loss
…rdotdev#24897 - Update otlp-native-conversion.md to reflect metrics are supported via companion PR vectordotdev#24897 (was "planned for future release") - Add native metrics auto-conversion example config - Update CUE docs requirements to mention all three signals - Accept DataType::all_bits() in OtlpSerializerConfig::input_type() so metrics are not rejected at config validation time - Improve metric error message to reference vectordotdev#24897 and passthrough
…rdotdev#24897 - Update otlp-native-conversion.md to reflect metrics are supported via companion PR vectordotdev#24897 (was "planned for future release") - Add native metrics auto-conversion example config - Update CUE docs requirements to mention all three signals - Accept DataType::all_bits() in OtlpSerializerConfig::input_type() so metrics are not rejected at config validation time - Improve metric error message to reference vectordotdev#24897 and passthrough
Add resolve_start_time() with 3-tier priority for start_time_unix_nano: 1. Read from metadata sidecar (OTLP roundtrip preservation) 2. Synthesize from timestamp - interval_ms (native incremental metrics) 3. Fall back to 0 (OTLP default) Replace all hardcoded start_time_unix_nano: 0 in build_otlp_metric with the resolved value across all 6 data point constructions (Counter, Gauge, Histogram, Summary, Distribution, Set).
Update opentelemetry.cue to reflect that start_time_unix_nano is now preserved for OTLP-sourced metrics and synthesized for native incremental metrics, replacing the stale claim that it is always lost. Update the roundtrip test to handle both pre-rebase (fallback to 0) and post-rebase (preserved via metadata) states so the test remains correct throughout the merge sequence.
- Pre-allocate Vec capacity in decompose_metric_tags (resource: 4, scope: 2, data_point: 8) to avoid repeated reallocation during tag iteration. - Change build_otlp_metric to take owned Vec<KeyValue> instead of &[KeyValue], eliminating 6 to_vec() clone calls per metric encode. Since only one match arm executes, the owned Vec moves directly into the constructed data point.
Read the typed attribute sidecar from %vector.otlp.metric_sidecar (stashed by PR vectordotdev#24905 during decode) and emit original OTLP types (IntValue, BoolValue, DoubleValue) instead of StringValue. Fingerprint-based staleness detection ensures correctness: if tags were mutated by transforms, the sidecar is ignored and the encoder falls back to string-based tag decomposition.
value_to_pb_value now detects the single-key wrapper format produced by
the decode side (e.g. {"int_value": 42}, {"bytes_value": <bytes>}) and
reconstructs the exact PBValue variant, including recursive ArrayValue
and KvlistValue handling. Falls back to scalar inference for unwrapped
values.
Also fixes stale CUE docs: scalar types ARE now preserved via sidecar,
BytesValue narrows to StringValue, and sidecar invalidation on tag
mutation falls back to StringValue.
…ests, fix docs Add three regression tests verifying the sidecar correctly roundtrips BytesValue (distinct from StringValue), ArrayValue (with mixed typed elements), and KvlistValue (nested key-value structure). Also fixes CUE docs: all OTLP value kinds are now preserved via the kind wrapper, not just scalar types.
The CUE docs reference KvlistValue (an OTLP protobuf type); the spell checker splits it into Kvlist + Value and flags Kvlist.
Replace lowercase kvlist with Kvlist in the expect list. The check-spelling tool was ignoring the added Kvlist entry because it considered the existing kvlist to be a more general variant, but kvlist wasn't matching the CamelCase KvlistValue in the CUE docs.
maycmlee
left a comment
There was a problem hiding this comment.
Some suggestions and a couple of questions!
| - `resource.*` — Stripped of prefix and placed into `Resource.attributes[]` (e.g. `resource.service.name` becomes attribute `service.name`) | ||
| - `resource_dropped_attributes_count` — Mapped to `Resource.dropped_attributes_count` (not an attribute) | ||
| - `resource_schema_url` — Mapped to `ResourceMetrics.schema_url` (not an attribute) | ||
| - `scope.name` — Mapped to `InstrumentationScope.name` | ||
| - `scope.version` — Mapped to `InstrumentationScope.version` | ||
| - `scope_dropped_attributes_count` — Mapped to `InstrumentationScope.dropped_attributes_count` (not an attribute) | ||
| - `scope_schema_url` — Mapped to `ScopeMetrics.schema_url` (not an attribute) | ||
| - `scope.*` (other) — Stripped of prefix and placed into `InstrumentationScope.attributes[]` |
There was a problem hiding this comment.
Let me know if this isn't correct. If it is correct, what does it mean to map the tag?
| - `resource.*` — Stripped of prefix and placed into `Resource.attributes[]` (e.g. `resource.service.name` becomes attribute `service.name`) | |
| - `resource_dropped_attributes_count` — Mapped to `Resource.dropped_attributes_count` (not an attribute) | |
| - `resource_schema_url` — Mapped to `ResourceMetrics.schema_url` (not an attribute) | |
| - `scope.name` — Mapped to `InstrumentationScope.name` | |
| - `scope.version` — Mapped to `InstrumentationScope.version` | |
| - `scope_dropped_attributes_count` — Mapped to `InstrumentationScope.dropped_attributes_count` (not an attribute) | |
| - `scope_schema_url` — Mapped to `ScopeMetrics.schema_url` (not an attribute) | |
| - `scope.*` (other) — Stripped of prefix and placed into `InstrumentationScope.attributes[]` | |
| - `resource.*` — Strips the prefix from the tag and adds the tag to `Resource.attributes[]` (for example, `resource.service.name` becomes attribute `service.name`) | |
| - `resource_dropped_attributes_count` — Maps the tag to `Resource.dropped_attributes_count` (not an attribute) | |
| - `resource_schema_url` — Maps the tag to `ResourceMetrics.schema_url` (not an attribute) | |
| - `scope.name` — Maps the tag to `InstrumentationScope.name` | |
| - `scope.version` — Maps the tag to `InstrumentationScope.version` | |
| - `scope_dropped_attributes_count` — Maps the tag to `InstrumentationScope.dropped_attributes_count` (not an attribute) | |
| - `scope_schema_url` — Maps the tag to to `ScopeMetrics.schema_url` (not an attribute) | |
| - `scope.*` (other) — Strips the prefix from the tag and adds the tag to `InstrumentationScope.attributes[]` |
| This is the expected behavior when round-tripping OTLP metrics through Vector, but may be surprising for metrics | ||
| from non-OTLP sources that coincidentally use these prefixes. |
There was a problem hiding this comment.
What would surprise the user? That these see those metrics from non-OTLP sources to be in the OTLP resource and scope structures (arrays?)
|
|
||
| **Known limitations:** | ||
|
|
||
| - Metric attribute types are preserved during OTLP→Vector→OTLP roundtrip via a typed metadata sidecar. |
There was a problem hiding this comment.
| - Metric attribute types are preserved during OTLP→Vector→OTLP roundtrip via a typed metadata sidecar. | |
| - Metric attribute types are preserved during a OTLP→Vector→OTLP round-trip with a typed metadata sidecar. |
|
|
||
| - Metric attribute types are preserved during OTLP→Vector→OTLP roundtrip via a typed metadata sidecar. | ||
| All OTLP value kinds (`StringValue`, `BytesValue`, `IntValue`, `BoolValue`, `DoubleValue`, `ArrayValue`, | ||
| `KvlistValue`) are stored with their kind wrapper and reconstructed on encode. If a VRL transform mutates |
There was a problem hiding this comment.
| `KvlistValue`) are stored with their kind wrapper and reconstructed on encode. If a VRL transform mutates | |
| `KvlistValue`) are stored with the kind's wrapper and reconstructed on encode. If a VRL transform changes the |
| All OTLP value kinds (`StringValue`, `BytesValue`, `IntValue`, `BoolValue`, `DoubleValue`, `ArrayValue`, | ||
| `KvlistValue`) are stored with their kind wrapper and reconstructed on encode. If a VRL transform mutates | ||
| metric tags, the sidecar is invalidated and all attributes fall back to `StringValue`. | ||
| - `start_time_unix_nano` is preserved for OTLP-sourced metrics via metadata stash. For native Vector |
There was a problem hiding this comment.
| - `start_time_unix_nano` is preserved for OTLP-sourced metrics via metadata stash. For native Vector | |
| - `start_time_unix_nano` is preserved for OTLP-sourced metrics using metadata stash. For native Vector |
| `KvlistValue`) are stored with their kind wrapper and reconstructed on encode. If a VRL transform mutates | ||
| metric tags, the sidecar is invalidated and all attributes fall back to `StringValue`. | ||
| - `start_time_unix_nano` is preserved for OTLP-sourced metrics via metadata stash. For native Vector | ||
| incremental metrics, it is synthesized from `timestamp - interval_ms` when available, otherwise set to `0`. |
There was a problem hiding this comment.
| incremental metrics, it is synthesized from `timestamp - interval_ms` when available, otherwise set to `0`. | |
| incremental metrics, `start_time_unix_nano` is set to `timestamp - interval_ms` when available, otherwise set to `0`. |
Co-authored-by: May Lee <may.lee@datadoghq.com>
Co-authored-by: May Lee <may.lee@datadoghq.com>
Summary
Adds encoding support for converting native Vector metrics to OTLP protobuf format (
ExportMetricsServiceRequest), enabling Vector metrics from any source to be sent through OTLP sinks without pre-formatted OTLP structure.This is the metrics counterpart to log/trace native conversion in #24621.
Related Issues
host_metrics→opentelemetrysink now works directly)codec: otlpnow properly encodes native metrics)User Impact
Before this PR, sending native Vector metrics to an OTLP collector required either:
use_otlp_decoding: trueon the source (passthrough only, no transforms possible)After this PR:
Conversion Architecture
graph LR subgraph "Vector Native Metrics" C[Counter] G[Gauge] H[AggregatedHistogram] S[AggregatedSummary] D[Distribution] Set[Set] end subgraph "OTLP Protobuf" Sum[Sum<br/>monotonic] OG[Gauge] OH[Histogram] OS[Summary] end C -->|"Incremental→Delta<br/>Absolute→Cumulative"| Sum G --> OG H -->|"Incremental→Delta<br/>Absolute→Cumulative"| OH S --> OS D -->|"samples→buckets"| OH Set -->|"cardinality count"| OGType Mapping
CounterSum(monotonic)Incremental→ Delta,Absolute→ CumulativeGaugeGaugeAggregatedHistogramHistogramIncremental→ Delta,Absolute→ CumulativeAggregatedSummarySummaryDistributionHistogramSetGaugeSketchTag Decomposition
Reverses the decode-path flattening from
build_metric_tags:resource.*Resource.attributes[](prefix stripped)resource_dropped_attributes_countResource.dropped_attributes_countresource_schema_urlResourceMetrics.schema_urlscope.nameInstrumentationScope.namescope.versionInstrumentationScope.versionscope_dropped_attributes_countInstrumentationScope.dropped_attributes_countscope_schema_urlScopeMetrics.schema_urlscope.*(other)InstrumentationScope.attributes[](prefix stripped)DataPoint.attributes[]start_time_unix_nanoResolutionOTLP data points require
start_time_unix_nano, but Vector's metric model only has one timestamp. The encode path resolves it with 3-tier priority:%vector.otlp.start_time_unix_nano(stashed by PR fix(opentelemetry lib): decode missing scope, schema_url, and resource fields #24905 during OTLP decode)timestamp - interval_msfor native incremental metrics with known interval0(OTLP meaning "not set")Typed Metric Attribute Sidecar
Consumes the typed attribute sidecar stashed by PR #24905 during OTLP decode. This preserves original OTLP attribute types across the Vector pipeline:
IntValue(200)→ tags →StringValue("200")IntValue(200)→ tags + sidecar →IntValue(200)BoolValue(true)→ tags →StringValue("true")BoolValue(true)→ tags + sidecar →BoolValue(true)DoubleValue(0.75)→ tags →StringValue("0.75")DoubleValue(0.75)→ tags + sidecar →DoubleValue(0.75)Fingerprint-based staleness detection:
decompose_metric_tags()(string attributes)Implementation:
try_sidecar(metric)→ checks fingerprint, extracts typed Resource/Scope/DataPoint attributesvalue_to_pb_value(v)→ converts VRL Value back to PBValue (local helper; replaced byFrom<Value> for PBValueafter rebase on feat(opentelemetry sink): add automatic native log and trace to OTLP conversion #24621)native_metric_to_otlp_request()→try_sidecar(metric).unwrap_or_else(|| decompose_metric_tags(...))Performance
Vec::with_capacity()for resource (4), scope (2), and data point (8) attribute vectorsbuild_otlp_metrictakesVec<KeyValue>(owned) instead of&[KeyValue], eliminating 6×.to_vec()clones across all metric type branchesSafety
From/TryFrom(no bareascasts for narrowing)None(no silent corruption)internal_log_rate_limitused for rate-limited warnings (Vector convention)Test Plan
start_time_unix_nanotests: metadata sidecar read, synthesis, absolute=0, no-interval=0, sidecar prioritycargo clippyclean,cargo fmtclean