Skip to content

feat(opentelemetry sink): add native metric to OTLP conversion#24897

Open
szibis wants to merge 31 commits intovectordotdev:masterfrom
szibis:feat/otlp-native-metrics-conversion
Open

feat(opentelemetry sink): add native metric to OTLP conversion#24897
szibis wants to merge 31 commits intovectordotdev:masterfrom
szibis:feat/otlp-native-metrics-conversion

Conversation

@szibis
Copy link
Contributor

@szibis szibis commented Mar 11, 2026

Summary

Adds encoding support for converting native Vector metrics to OTLP protobuf format (ExportMetricsServiceRequest), enabling Vector metrics from any source to be sent through OTLP sinks without pre-formatted OTLP structure.

This is the metrics counterpart to log/trace native conversion in #24621.

Related Issues

User Impact

Before this PR, sending native Vector metrics to an OTLP collector required either:

  1. use_otlp_decoding: true on the source (passthrough only, no transforms possible)
  2. Complex 50+ line VRL transforms to manually build OTLP protobuf structure

After this PR:

sources:
  host_metrics:
    type: host_metrics
sinks:
  otel_out:
    type: opentelemetry
    inputs: ["host_metrics"]
    encoding:
      codec: otlp  # Native metrics auto-converted

Conversion Architecture

graph LR
    subgraph "Vector Native Metrics"
        C[Counter]
        G[Gauge]
        H[AggregatedHistogram]
        S[AggregatedSummary]
        D[Distribution]
        Set[Set]
    end
    subgraph "OTLP Protobuf"
        Sum[Sum<br/>monotonic]
        OG[Gauge]
        OH[Histogram]
        OS[Summary]
    end
    C -->|"Incremental→Delta<br/>Absolute→Cumulative"| Sum
    G --> OG
    H -->|"Incremental→Delta<br/>Absolute→Cumulative"| OH
    S --> OS
    D -->|"samples→buckets"| OH
    Set -->|"cardinality count"| OG
Loading

Type Mapping

Vector MetricValue OTLP Data Type Temporality Logic
Counter Sum (monotonic) Incremental → Delta, Absolute → Cumulative
Gauge Gauge
AggregatedHistogram Histogram Incremental → Delta, Absolute → Cumulative
AggregatedSummary Summary
Distribution Histogram samples converted to bucket boundaries
Set Gauge emits unique value count
Sketch (dropped with warning) not directly representable in OTLP

Tag Decomposition

Reverses the decode-path flattening from build_metric_tags:

Tag Pattern OTLP Destination
resource.* Resource.attributes[] (prefix stripped)
resource_dropped_attributes_count Resource.dropped_attributes_count
resource_schema_url ResourceMetrics.schema_url
scope.name InstrumentationScope.name
scope.version InstrumentationScope.version
scope_dropped_attributes_count InstrumentationScope.dropped_attributes_count
scope_schema_url ScopeMetrics.schema_url
scope.* (other) InstrumentationScope.attributes[] (prefix stripped)
everything else DataPoint.attributes[]

start_time_unix_nano Resolution

OTLP data points require start_time_unix_nano, but Vector's metric model only has one timestamp. The encode path resolves it with 3-tier priority:

  1. Metadata sidecar — reads %vector.otlp.start_time_unix_nano (stashed by PR fix(opentelemetry lib): decode missing scope, schema_url, and resource fields #24905 during OTLP decode)
  2. Synthesis — computes timestamp - interval_ms for native incremental metrics with known interval
  3. Default — falls back to 0 (OTLP meaning "not set")

Typed Metric Attribute Sidecar

Consumes the typed attribute sidecar stashed by PR #24905 during OTLP decode. This preserves original OTLP attribute types across the Vector pipeline:

Without Sidecar With Sidecar
IntValue(200) → tags → StringValue("200") IntValue(200) → tags + sidecar → IntValue(200)
BoolValue(true) → tags → StringValue("true") BoolValue(true) → tags + sidecar → BoolValue(true)
DoubleValue(0.75) → tags → StringValue("0.75") DoubleValue(0.75) → tags + sidecar → DoubleValue(0.75)

Fingerprint-based staleness detection:

  • On decode: fingerprint computed from stringified tags, stored in sidecar
  • On encode: fingerprint recomputed from current tags
  • Match → use sidecar (typed attributes)
  • Mismatch (tags mutated by transform) → fall back to decompose_metric_tags() (string attributes)

Implementation:

Performance

  • Pre-allocated Vec::with_capacity() for resource (4), scope (2), and data point (8) attribute vectors
  • build_otlp_metric takes Vec<KeyValue> (owned) instead of &[KeyValue], eliminating 6× .to_vec() clones across all metric type branches

Safety

  • Zero panics, zero unsafe blocks in encode path
  • All numeric casts use From/TryFrom (no bare as casts for narrowing)
  • Non-finite (NaN/Infinity) distribution samples are filtered before accumulation
  • Sketch metrics produce None (no silent corruption)
  • internal_log_rate_limit used for rate-limited warnings (Vector convention)

Test Plan

  • 47 unit tests covering all metric type conversions
  • Roundtrip tests (encode → decode → verify fidelity) for Counter, Gauge, Histogram, Summary
  • Tag decomposition: resource, scope, data point attributes, mixed, prefix collision
  • Edge cases: zero values, negative values, empty buckets/quantiles, large values, special chars
  • NaN/Infinity handling: distribution samples skipped, gauge/counter passthrough, all-NaN distribution
  • Sketch metric dropped (no data), zero-rate samples, scope custom attributes
  • 5 start_time_unix_nano tests: metadata sidecar read, synthesis, absolute=0, no-interval=0, sidecar priority
  • 5 typed sidecar tests: typed resource attrs, typed dp attrs, staleness fallback, no-sidecar backward compat, scope metadata
  • cargo clippy clean, cargo fmt clean
  • CI passes

Add encoding support for converting native Vector metrics to OTLP
protobuf format (ExportMetricsServiceRequest). This enables Vector
metrics from any source to be sent through OTLP sinks without
requiring pre-formatted OTLP structure.

Metric type mapping:
- Counter → Sum (monotonic, Delta/Cumulative per MetricKind)
- Gauge → Gauge
- AggregatedHistogram → Histogram
- AggregatedSummary → Summary
- Distribution → Histogram (samples → buckets)
- Set → Gauge (cardinality count)
- Sketch → Gauge with warning (unsupported)

Tag decomposition reverses the decode-path flattening:
- resource.* tags → resource attributes
- scope.name/version → InstrumentationScope
- scope.* tags → scope attributes
- All other tags → data point attributes

Includes 36 tests covering type conversions, tag decomposition,
roundtrip fidelity, and edge cases.
@szibis szibis requested a review from a team as a code owner March 11, 2026 16:24
@szibis szibis changed the title feat(opentelemetry): native metric → OTLP conversion feat(codecs): native metric to OTLP conversion Mar 11, 2026
@szibis szibis changed the title feat(codecs): native metric to OTLP conversion feat(opentelemetry sink): add native metric to OTLP conversion Mar 11, 2026
Update opentelemetry sink CUE requirements to reflect native metric
auto-conversion support. Add comprehensive otlp-native-conversion
docs with metric type mapping, tag decomposition reference, and
use case examples. Add kvlist to spelling expect list.
@szibis szibis requested a review from a team as a code owner March 11, 2026 16:44
@github-actions github-actions bot added domain: ci Anything related to Vector's CI environment domain: external docs Anything related to Vector's external, public documentation labels Mar 11, 2026
This feature targets the 0.55.0 release, not 0.54.0.
@buraizu
Copy link
Contributor

buraizu commented Mar 11, 2026

Created DOCS-13660 for documentation team review

@buraizu buraizu removed their assignment Mar 11, 2026
szibis added 8 commits March 11, 2026 23:24
… encode

Apply review feedback patterns from PR vectordotdev#24621:
- Replace `as u64` with `u64::try_from().ok()` for timestamp conversion
- Replace `as u64`/`as f64` with `u64::from()`/`f64::from()` for sample.rate
- Remove unwrap() in Distribution bucket overflow guard, use
  saturating index clamping instead
…ogram conversion

NaN and Infinity sample values were not excluded from the accumulation
loop in Distribution-to-Histogram conversion, even though they were
already filtered from the boundary list. This caused NaN to poison
the total_sum and misroute samples via binary_search_by. Non-finite
samples are now skipped in the accumulation loop, matching the
boundary filter.
Sketch metrics were converted to an empty gauge (value 0.0), causing
silent data corruption downstream. Now Sketch metrics produce a Metric
with no data field, which OTLP receivers will ignore. Also fixes the
rate-limit key to use internal_log_rate_limit (Vector convention).
Add clippy::cast_precision_loss annotation on values.len() as f64 in
Set conversion (usize→f64 is lossy for very large sets but acceptable).
Fix test to use u64::try_from instead of bare 'as u64' cast, matching
the production code pattern.
Add tests covering NaN/Infinity distribution samples, non-finite gauge
and counter values, zero-rate samples, Sketch metric dropping, tag
prefix collision routing, scope custom attributes, and non-finite
histogram/summary sums.
Sketch metrics are now dropped with a warning instead of emitting
an empty gauge, matching the code change.
@szibis
Copy link
Contributor Author

szibis commented Mar 12, 2026

Companion decode-side fix: #24905 adds decode for missing scope, schema_url, and resource.dropped_attributes_count fields across logs, traces, and metrics. This ensures the fields that this PR's encode path handles are actually populated by the decode side.

… tags

Update decompose_metric_tags to handle 4 special tags as proto-level
structural fields rather than generic attributes:
- resource.dropped_attributes_count → Resource.dropped_attributes_count
- resource.schema_url → ResourceMetrics.schema_url
- scope.dropped_attributes_count → InstrumentationScope.dropped_attributes_count
- scope.schema_url → ScopeMetrics.schema_url

This ensures round-trip fidelity with fix/otlp-decode-missing-fields
(vectordotdev#24905) once merged, while remaining backward-compatible (graceful
defaults of 0 / empty string) before that PR merges.
szibis and others added 4 commits March 12, 2026 18:01
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
…nversion

- Use DataType::all_bits() instead of manual OR (reviewer nit)
- Split OtlpSerializer doc into "Pre-formatted OTLP events" and
  "Native Vector events" sections for clarity
- Sketch metrics now truly dropped (return None from
  native_metric_to_otlp_request instead of including empty metric)
- Document resource.*/scope.* tag prefix reservation for OTLP mapping
- Add sample config to changelog fragment
- Remove orphaned docs/examples/otlp-native-conversion.md (content
  folded into encoder docstring)
Add explicit user-facing documentation about resource.*/scope.* tag
prefix reservation in OTLP encoding:

- CUE sink docs: new "Metric tag prefix conventions" section
- Changelog: explicitly notes prefix reservation and behavior for
  non-OTLP sources using these prefixes coincidentally

Addresses review feedback requesting explicit documentation for the
implicit prefix reservation behavior.
@szibis szibis requested a review from pront March 12, 2026 17:20
szibis added 2 commits March 14, 2026 16:32
…aming

PR24905 changed metadata tags from dot-separated (resource.schema_url)
to underscore-separated (resource_schema_url) to avoid colliding with
user-supplied resource/scope attributes. Update the encode path to
match: decompose_metric_tags now recognizes the underscore tag names.
Add edge_case_tests module to metrics.rs with 10 tests covering:
- ExponentialHistogram decode path (scale=0 with positive/negative/zero buckets)
- Counter with negative value roundtrip
- Histogram with empty explicit_bounds (single +inf bucket)
- Summary with no quantiles
- Gauge special float values (NaN, Inf, -Inf, subnormal)
- Multiple data points per metric (flattened to separate events)
- Empty metric name
- All aggregation temporalities (UNSPECIFIED, DELTA, CUMULATIVE)
- start_time_unix_nano independence from metric timestamp
- u64::MAX timestamp overflow safety
@szibis
Copy link
Contributor Author

szibis commented Mar 14, 2026

Local Integration & Edge Case Testing

Set up a local test pipeline — Vector OTLP source on ports 4317/4318, piped through an OTLP sink to a second Vector instance on 4319/4320 for full roundtrip testing. Used grpcurl with proto files from lib/opentelemetry-proto/src/proto/opentelemetry-proto/ to send payloads. Sent 3 metric payloads locally (counter, gauge, histogram) through the same roundtrip pipeline. No bugs, but the roundtrip revealed two known limitations documented in the tests: ExponentialHistogram re-encodes as regular Histogram (decode works, encode doesn't produce exponential output), and start_time_unix_nano is lost since Vector's MetricEvent doesn't carry a separate start time.

Added 10 tests to metrics.rs, ran with cargo test -p opentelemetry-proto --lib metrics::edge_case_tests. Covers ExponentialHistogram decode with positive/negative/zero buckets, negative counter, single +inf bucket histogram, empty summary quantiles, special floats (NaN, ±Inf, subnormal), multiple data points, empty name, all temporalities, and overflow.

Performance

Pushed 10k metric events in batches of 100 — ~5.2k events/s, zero errors. These numbers mostly reflect grpcurl client overhead — a compiled gRPC client would be significantly faster. The important takeaway is stability: no crashes, no data corruption, no memory issues under sustained load.

@szibis
Copy link
Contributor Author

szibis commented Mar 14, 2026

Note on testing methodology: For the local integration and performance tests, all 3 PRs (#24905, #24621, #24897) were merged locally into a single test branch (test/otlp-all-prs-merged) and compiled into one binary. This let me test the full decode→encode→decode roundtrip across all signal types together, which is how they'll actually run in production. The unit tests in each PR are self-contained and run independently on their respective branches.

szibis added 2 commits March 14, 2026 17:49
…ations

- Fix 4 metadata tag names in CUE docs to match implementation:
  resource.dropped_attributes_count → resource_dropped_attributes_count
  resource.schema_url → resource_schema_url
  scope.dropped_attributes_count → scope_dropped_attributes_count
  scope.schema_url → scope_schema_url
- Document known metric roundtrip limitations: attribute type
  stringification and start_time_unix_nano loss
szibis added a commit to szibis/vector that referenced this pull request Mar 15, 2026
…rdotdev#24897

- Update otlp-native-conversion.md to reflect metrics are supported
  via companion PR vectordotdev#24897 (was "planned for future release")
- Add native metrics auto-conversion example config
- Update CUE docs requirements to mention all three signals
- Accept DataType::all_bits() in OtlpSerializerConfig::input_type()
  so metrics are not rejected at config validation time
- Improve metric error message to reference vectordotdev#24897 and passthrough
szibis added a commit to szibis/vector that referenced this pull request Mar 15, 2026
…rdotdev#24897

- Update otlp-native-conversion.md to reflect metrics are supported
  via companion PR vectordotdev#24897 (was "planned for future release")
- Add native metrics auto-conversion example config
- Update CUE docs requirements to mention all three signals
- Accept DataType::all_bits() in OtlpSerializerConfig::input_type()
  so metrics are not rejected at config validation time
- Improve metric error message to reference vectordotdev#24897 and passthrough
@szibis
Copy link
Contributor Author

szibis commented Mar 15, 2026

#24905 (OTLP decode refactor)

#24621 (OTLP encode logs/traces)

#24897 (OTLP encode metrics)

Each PR rebases on top of the previous one after merge.

@pront after each merge I will do the cleanup+rebase to make this flow easiest as possible for Vector Team.

szibis added 8 commits March 15, 2026 16:35
Add resolve_start_time() with 3-tier priority for start_time_unix_nano:
1. Read from metadata sidecar (OTLP roundtrip preservation)
2. Synthesize from timestamp - interval_ms (native incremental metrics)
3. Fall back to 0 (OTLP default)

Replace all hardcoded start_time_unix_nano: 0 in build_otlp_metric with
the resolved value across all 6 data point constructions (Counter, Gauge,
Histogram, Summary, Distribution, Set).
Update opentelemetry.cue to reflect that start_time_unix_nano is now
preserved for OTLP-sourced metrics and synthesized for native incremental
metrics, replacing the stale claim that it is always lost.

Update the roundtrip test to handle both pre-rebase (fallback to 0) and
post-rebase (preserved via metadata) states so the test remains correct
throughout the merge sequence.
- Pre-allocate Vec capacity in decompose_metric_tags (resource: 4,
  scope: 2, data_point: 8) to avoid repeated reallocation during
  tag iteration.
- Change build_otlp_metric to take owned Vec<KeyValue> instead of
  &[KeyValue], eliminating 6 to_vec() clone calls per metric encode.
  Since only one match arm executes, the owned Vec moves directly
  into the constructed data point.
Read the typed attribute sidecar from %vector.otlp.metric_sidecar
(stashed by PR vectordotdev#24905 during decode) and emit original OTLP types
(IntValue, BoolValue, DoubleValue) instead of StringValue.

Fingerprint-based staleness detection ensures correctness: if tags
were mutated by transforms, the sidecar is ignored and the encoder
falls back to string-based tag decomposition.
value_to_pb_value now detects the single-key wrapper format produced by
the decode side (e.g. {"int_value": 42}, {"bytes_value": <bytes>}) and
reconstructs the exact PBValue variant, including recursive ArrayValue
and KvlistValue handling. Falls back to scalar inference for unwrapped
values.

Also fixes stale CUE docs: scalar types ARE now preserved via sidecar,
BytesValue narrows to StringValue, and sidecar invalidation on tag
mutation falls back to StringValue.
…ests, fix docs

Add three regression tests verifying the sidecar correctly roundtrips
BytesValue (distinct from StringValue), ArrayValue (with mixed typed
elements), and KvlistValue (nested key-value structure). Also fixes CUE
docs: all OTLP value kinds are now preserved via the kind wrapper, not
just scalar types.
The CUE docs reference KvlistValue (an OTLP protobuf type); the
spell checker splits it into Kvlist + Value and flags Kvlist.
Replace lowercase kvlist with Kvlist in the expect list. The
check-spelling tool was ignoring the added Kvlist entry because it
considered the existing kvlist to be a more general variant, but
kvlist wasn't matching the CamelCase KvlistValue in the CUE docs.
Copy link
Contributor

@maycmlee maycmlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions and a couple of questions!

Comment on lines +47 to +54
- `resource.*` — Stripped of prefix and placed into `Resource.attributes[]` (e.g. `resource.service.name` becomes attribute `service.name`)
- `resource_dropped_attributes_count` — Mapped to `Resource.dropped_attributes_count` (not an attribute)
- `resource_schema_url` — Mapped to `ResourceMetrics.schema_url` (not an attribute)
- `scope.name` — Mapped to `InstrumentationScope.name`
- `scope.version` — Mapped to `InstrumentationScope.version`
- `scope_dropped_attributes_count` — Mapped to `InstrumentationScope.dropped_attributes_count` (not an attribute)
- `scope_schema_url` — Mapped to `ScopeMetrics.schema_url` (not an attribute)
- `scope.*` (other) — Stripped of prefix and placed into `InstrumentationScope.attributes[]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this isn't correct. If it is correct, what does it mean to map the tag?

Suggested change
- `resource.*`Stripped of prefix and placed into `Resource.attributes[]` (e.g. `resource.service.name` becomes attribute `service.name`)
- `resource_dropped_attributes_count`Mapped to `Resource.dropped_attributes_count` (not an attribute)
- `resource_schema_url`Mapped to `ResourceMetrics.schema_url` (not an attribute)
- `scope.name`Mapped to `InstrumentationScope.name`
- `scope.version`Mapped to `InstrumentationScope.version`
- `scope_dropped_attributes_count`Mapped to `InstrumentationScope.dropped_attributes_count` (not an attribute)
- `scope_schema_url`Mapped to `ScopeMetrics.schema_url` (not an attribute)
- `scope.*` (other) — Stripped of prefix and placed into `InstrumentationScope.attributes[]`
- `resource.*`Strips the prefix from the tag and adds the tag to `Resource.attributes[]` (for example, `resource.service.name` becomes attribute `service.name`)
- `resource_dropped_attributes_count`Maps the tag to `Resource.dropped_attributes_count` (not an attribute)
- `resource_schema_url`Maps the tag to `ResourceMetrics.schema_url` (not an attribute)
- `scope.name`Maps the tag to `InstrumentationScope.name`
- `scope.version`Maps the tag to `InstrumentationScope.version`
- `scope_dropped_attributes_count`Maps the tag to `InstrumentationScope.dropped_attributes_count` (not an attribute)
- `scope_schema_url`Maps the tag to to `ScopeMetrics.schema_url` (not an attribute)
- `scope.*` (other) — Strips the prefix from the tag and adds the tag to `InstrumentationScope.attributes[]`

Comment on lines +60 to +61
This is the expected behavior when round-tripping OTLP metrics through Vector, but may be surprising for metrics
from non-OTLP sources that coincidentally use these prefixes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would surprise the user? That these see those metrics from non-OTLP sources to be in the OTLP resource and scope structures (arrays?)


**Known limitations:**

- Metric attribute types are preserved during OTLP→Vector→OTLP roundtrip via a typed metadata sidecar.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Metric attribute types are preserved during OTLP→Vector→OTLP roundtrip via a typed metadata sidecar.
- Metric attribute types are preserved during a OTLP→Vector→OTLP round-trip with a typed metadata sidecar.


- Metric attribute types are preserved during OTLP→Vector→OTLP roundtrip via a typed metadata sidecar.
All OTLP value kinds (`StringValue`, `BytesValue`, `IntValue`, `BoolValue`, `DoubleValue`, `ArrayValue`,
`KvlistValue`) are stored with their kind wrapper and reconstructed on encode. If a VRL transform mutates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`KvlistValue`) are stored with their kind wrapper and reconstructed on encode. If a VRL transform mutates
`KvlistValue`) are stored with the kind's wrapper and reconstructed on encode. If a VRL transform changes the

All OTLP value kinds (`StringValue`, `BytesValue`, `IntValue`, `BoolValue`, `DoubleValue`, `ArrayValue`,
`KvlistValue`) are stored with their kind wrapper and reconstructed on encode. If a VRL transform mutates
metric tags, the sidecar is invalidated and all attributes fall back to `StringValue`.
- `start_time_unix_nano` is preserved for OTLP-sourced metrics via metadata stash. For native Vector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `start_time_unix_nano` is preserved for OTLP-sourced metrics via metadata stash. For native Vector
- `start_time_unix_nano` is preserved for OTLP-sourced metrics using metadata stash. For native Vector

`KvlistValue`) are stored with their kind wrapper and reconstructed on encode. If a VRL transform mutates
metric tags, the sidecar is invalidated and all attributes fall back to `StringValue`.
- `start_time_unix_nano` is preserved for OTLP-sourced metrics via metadata stash. For native Vector
incremental metrics, it is synthesized from `timestamp - interval_ms` when available, otherwise set to `0`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
incremental metrics, it is synthesized from `timestamp - interval_ms` when available, otherwise set to `0`.
incremental metrics, `start_time_unix_nano` is set to `timestamp - interval_ms` when available, otherwise set to `0`.

szibis and others added 2 commits March 16, 2026 16:55
Co-authored-by: May Lee <may.lee@datadoghq.com>
Co-authored-by: May Lee <may.lee@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: ci Anything related to Vector's CI environment domain: external docs Anything related to Vector's external, public documentation editorial review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Source Host Metrics in OTEL format Support sending OpenTelemetry metrics

4 participants