Skip to content

fix(opentelemetry lib): decode missing scope, schema_url, and resource fields#24905

Open
szibis wants to merge 14 commits intovectordotdev:masterfrom
szibis:fix/otlp-decode-missing-fields
Open

fix(opentelemetry lib): decode missing scope, schema_url, and resource fields#24905
szibis wants to merge 14 commits intovectordotdev:masterfrom
szibis:fix/otlp-decode-missing-fields

Conversation

@szibis
Copy link
Contributor

@szibis szibis commented Mar 12, 2026

Summary

The OTLP decode path silently drops several protobuf fields during conversion to Vector events, causing data loss and breaking round-trip fidelity (OTLP → Vector → OTLP). This PR fixes all missing fields and adds metadata preservation for lossless OTLP metric roundtrips.

1. Missing Fields

Before this PR

Field Logs Traces Metrics
scope.name ✅ (tag)
scope.version ✅ (tag)
scope.attributes ✅ (tags)
scope.dropped_attributes_count
ScopeX.schema_url
ResourceX.schema_url
resource.dropped_attributes_count

After this PR

Field Logs Traces Metrics
scope.name ✅ (tag)
scope.version ✅ (tag)
scope.attributes ✅ (tags)
scope.dropped_attributes_count ✅ (tag)
ScopeX.schema_url ✅ (tag)
ResourceX.schema_url ✅ (tag)
resource.dropped_attributes_count ✅ (tag)
start_time_unix_nano (metrics) ✅ (metadata)
Typed attribute preservation (metrics) ✅ (sidecar)

Field mapping

Logs (Legacy / Vector namespace):

  • ScopeLogs.schema_urlscope.schema_url / %opentelemetry.scope.schema_url
  • ResourceLogs.schema_urlschema_url / %opentelemetry.resources.schema_url
  • Resource.dropped_attributes_countresource_dropped_attributes_count / %opentelemetry.resources.dropped_attributes_count

Traces (always at event root):

  • ScopeSpans.scope.*scope.name, scope.version, scope.attributes, scope.dropped_attributes_count
  • ScopeSpans.schema_urlscope.schema_url
  • ResourceSpans.schema_urlschema_url
  • Resource.dropped_attributes_countresource_dropped_attributes_count

Metrics (as tags, following existing resource.* / scope.* pattern):

  • scope_dropped_attributes_count, scope_schema_url, resource_schema_url, resource_dropped_attributes_count

2. start_time_unix_nano Preservation

OTLP metrics carry start_time_unix_nano on every data point, but Vector's metric model has only one timestamp. Previously this was silently dropped.

Now stored in EventMetadata at %vector.otlp.start_time_unix_nano:

3. Typed Metric Attribute Sidecar

OTLP attributes carry typed values (IntValue, BoolValue, DoubleValue), but Vector's MetricTags model stores everything as strings. Previously, all attributes became StringValue on re-encode.

Now stashed in EventMetadata at %vector.otlp.metric_sidecar:

Sidecar Field Content
resource_attributes VRL Object preserving original types via kv_list_into_value()
scope_attributes VRL Object preserving original types
data_point_attributes VRL Object preserving original types
scope_name String
scope_version String
resource_dropped_attributes_count Integer
scope_dropped_attributes_count Integer
tags_fingerprint Hash of stringified tags for staleness detection

Borrow-before-consume pattern: The sidecar borrows &Resource / &InstrumentationScope before build_metric_tags() consumes them, avoiding cloning entire structures.

Staleness detection: The encode side (PR #24897) recomputes the fingerprint from current tags. If tags were mutated by transforms, the sidecar is ignored and the encoder falls back to string-based decomposition.

Scenario Before After
OTLP roundtrip (no transforms) All attrs → StringValue Original types preserved
OTLP after tag-mutating transform StringValue StringValue (correct fallback)
start_time_unix_nano from OTLP Lost (hardcoded 0) Preserved
Native metric (no OTLP source) String tags Same (backward compatible)

Related

Test plan

  • 31 unit tests for missing fields (12 logs, 11 traces, 8 metrics)
  • 5 unit tests for start_time_unix_nano preservation (4 metric types + zero-not-stored)
  • 5 unit tests for typed sidecar (typed resource attrs, typed dp attrs, scope metadata, fingerprint validity, empty sidecar omission)
  • Tests verify both presence of new fields and absence when empty/zero
  • Tests cover Legacy and Vector namespace for logs
  • Combined tests verify all new fields work together with existing fields
  • Integration test with OTLP collector for round-trip verification

@szibis szibis requested a review from a team as a code owner March 12, 2026 08:06
szibis added a commit to szibis/vector that referenced this pull request Mar 12, 2026
- Add changelog fragment for vectordotdev#24905
- Document new log output fields in source CUE: scope.schema_url,
  schema_url (resource-level), resource_dropped_attributes_count
- Add comprehensive trace output field documentation to source CUE,
  including all span fields, scope fields, schema_url, and
  resource_dropped_attributes_count (previously undocumented)
@szibis szibis requested a review from a team as a code owner March 12, 2026 08:14
@github-actions github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Mar 12, 2026
szibis added a commit to szibis/vector that referenced this pull request Mar 12, 2026


Extract scope.schema_url, resource schema_url, resource_dropped_attributes_count,
and scope.dropped_attributes_count in the native-to-OTLP encode path. These fields
are produced by the decode fix in vectordotdev#24905 — the encode now reads them when present
and falls back to defaults (empty/0) when absent, ensuring full round-trip fidelity
once vectordotdev#24905 merges while remaining backward-compatible before it does.

Also fixes schema_url mapping: root "schema_url" now correctly maps to
ResourceLogs/ResourceSpans.schema_url (resource level), while "scope.schema_url"
maps to ScopeLogs/ScopeSpans.schema_url (scope level).
szibis added a commit to szibis/vector that referenced this pull request Mar 12, 2026
… tags

Update decompose_metric_tags to handle 4 special tags as proto-level
structural fields rather than generic attributes:
- resource.dropped_attributes_count → Resource.dropped_attributes_count
- resource.schema_url → ResourceMetrics.schema_url
- scope.dropped_attributes_count → InstrumentationScope.dropped_attributes_count
- scope.schema_url → ScopeMetrics.schema_url

This ensures round-trip fidelity with fix/otlp-decode-missing-fields
(vectordotdev#24905) once merged, while remaining backward-compatible (graceful
defaults of 0 / empty string) before that PR merges.
@szibis szibis changed the title fix(opentelemetry): decode missing scope, schema_url, and resource fields fix(opentelemetry lib): decode missing scope, schema_url, and resource fields Mar 12, 2026
@pront
Copy link
Member

pront commented Mar 12, 2026

Hi @szibis, you have quite a few OTEL PRs open: https://github.com/vectordotdev/vector/pulls?q=sort%3Aupdated-desc+is%3Apr+is%3Aopen+author%3Aszibis+

Can you please list the order in which you want me to review them here? Even better, I would mark all but one as draft so I can keep filtering them without need to exchange comments here.

@szibis
Copy link
Contributor Author

szibis commented Mar 12, 2026

Hi @szibis, you have quite a few OTEL PRs open: https://github.com/vectordotdev/vector/pulls?q=sort%3Aupdated-desc+is%3Apr+is%3Aopen+author%3Aszibis+

Can you please list the order in which you want me to review them here? Even better, I would mark all but one as draft so I can keep filtering them without need to exchange comments here.

@pront Sorry for that, but I just discovered this gaps and avoiding one big PR addon.

  1. fix(opentelemetry lib): decode missing scope, schema_url, and resource fields #24905 - this PR for decode missing scopes - Full OTLP format baseline for all later PR's
  2. feat(opentelemetry sink): add automatic native log and trace to OTLP conversion #24621 - For auto-convert sink in Logs and Traces
  3. feat(opentelemetry sink): add native metric to OTLP conversion #24897 - Based on Logs and Traces native convert implement Metrics auto convert.

cswatt
cswatt previously approved these changes Mar 12, 2026
@szibis szibis requested a review from pront March 12, 2026 18:44
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new resources.schema_url / resources.dropped_attributes_count field handling introduces a backwards-compatibility issue with Vector namespace logs: they’re written into the same resources object that already contains arbitrary OTLP resource attributes, so valid incoming attributes can now be silently overwritten.

From my local test:

{
  "service.name": "checkout",
  "schema_url": "tenant-defined-value",
  "dropped_attributes_count": "user-payload"
}

With resource metadata:

{
  "schema_url": "https://resource.schema",
  "dropped_attributes_count": 7
}

The emitted event was:

{
  "otel_resources": {
    "service.name": "checkout",
    "schema_url": "https://resource.schema",
    "dropped_attributes_count": 7
  }
}

So the original resource attributes schema_url = "tenant-defined-value" and dropped_attributes_count = "user-payload" were lost.

Repro config:

data_dir: "/tmp/vector-pr-24905-data"

sources:
  otel:
    type: opentelemetry
    use_otlp_decoding: false
    log_namespace: true
    grpc:
      address: "127.0.0.1:43171"
    http:
      address: "127.0.0.1:43181"

transforms:
  expose_meta:
    type: remap
    inputs:
      - otel.logs
    source: |
      .otel_resources = %opentelemetry.resources
      .otel_scope = %opentelemetry.scope
      .otel_timestamp = %opentelemetry.timestamp

sinks:
  out:
    type: console
    inputs:
      - expose_meta
    target: stdout
    encoding:
      codec: json

We should preserve %opentelemetry.resources as the raw user-supplied resource attributes.

Also, we have the same type of issue with Metrics: resource.* / scope.* tag collisions.

Generally, if a field was not literally present as a user attribute, it should not be inserted into the raw attribute map. We should not place synthetic or derived metadata into a namespace that is also used for raw user payload.

@szibis
Copy link
Contributor Author

szibis commented Mar 13, 2026

We should preserve %opentelemetry.resources as the raw user-supplied resource attributes.

Also, we have the same type of issue with Metrics: resource.* / scope.* tag collisions.

Generally, if a field was not literally present as a user attribute, it should not be inserted into the raw attribute map. We should not place synthetic or derived metadata into a namespace that is also used for raw user payload.

@pront All fixed

@szibis szibis requested review from cswatt and pront March 13, 2026 21:03
@szibis
Copy link
Contributor Author

szibis commented Mar 14, 2026

Local Integration & Edge Case Testing

Set up a local test pipeline — Vector OTLP source on ports 4317/4318, piped through an OTLP sink to a second Vector instance on 4319/4320 for full roundtrip testing. Used grpcurl with proto files from lib/opentelemetry-proto/src/proto/opentelemetry-proto/ to send payloads. Initially tried HTTP with curl but Vector's OTLP HTTP endpoint only accepts application/x-protobuf, not JSON, so gRPC was the only option.

That's how I caught the actual bug — passthrough output showed time_unix_nano = 0 stored as Unix epoch (1970-01-01) instead of Null. Per OTLP spec, 0 means "timestamp not set", so this was silently corrupting data. Fixed with a nanos_to_value() helper that treats 0 as unset and guards against u64→i64 overflow past year 2262.

Wrote 15 unit tests in spans.rs, ran with cargo test -p opentelemetry-proto --lib spans — all green, clippy clean. Tests cover the fix plus edge cases like all SpanKind values, multiple events/links/spans, status codes, unicode, and overflow.

Performance

Ran 10k trace events through the roundtrip pipeline in batches of 100 via grpcurl — ~5k events/s, zero errors or panics. These numbers mostly reflect grpcurl client overhead — a compiled gRPC client would be significantly faster. The important takeaway is stability: no crashes, no data corruption, no memory issues under sustained load.

@szibis
Copy link
Contributor Author

szibis commented Mar 14, 2026

Note on testing methodology: For the local integration and performance tests, all 3 PRs (#24905, #24621, #24897) were merged locally into a single test branch (test/otlp-all-prs-merged) and compiled into one binary. This let me test the full decode→encode→decode roundtrip across all signal types together, which is how they'll actually run in production. The unit tests in each PR are self-contained and run independently on their respective branches.

szibis added 5 commits March 15, 2026 15:40
…elds

The OTLP decode path drops several protobuf fields during conversion to
Vector events. This causes silent data loss and breaks round-trip fidelity
when events are later re-encoded to OTLP format.

Fields now decoded:

Logs:
- ScopeLogs.schema_url → scope.schema_url / %opentelemetry.scope.schema_url
- ResourceLogs.schema_url → schema_url / %opentelemetry.resources.schema_url
- Resource.dropped_attributes_count → resource_dropped_attributes_count

Traces:
- ScopeSpans.scope (name, version, attributes, dropped_attributes_count)
- ScopeSpans.schema_url → scope.schema_url
- ResourceSpans.schema_url → schema_url
- Resource.dropped_attributes_count → resource_dropped_attributes_count

Metrics:
- scope.dropped_attributes_count → tag
- ScopeMetrics.schema_url → scope.schema_url tag
- ResourceMetrics.schema_url → resource.schema_url tag
- Resource.dropped_attributes_count → resource.dropped_attributes_count tag

Closes vectordotdev#24904
Relates to vectordotdev#15500
- Add changelog fragment for vectordotdev#24905
- Document new log output fields in source CUE: scope.schema_url,
  schema_url (resource-level), resource_dropped_attributes_count
- Add comprehensive trace output field documentation to source CUE,
  including all span fields, scope fields, schema_url, and
  resource_dropped_attributes_count (previously undocumented)
…ions

Remove redundant .clone() calls in metrics tag building (format! only
borrows), eliminate Value clone for observed_timestamp by keeping it as
DateTime<Utc> (Copy), and remove unnecessary resource.clone() where self
is already consumed by value. Add inline documentation for intentional
Legacy vs Vector namespace path asymmetry on schema_url and
resource_dropped_attributes_count fields.
… resources overwrite

In Vector namespace, the resources insert (kv_list_into_value for
attributes) overwrites the entire "resources" metadata key. Moving
resource_schema_url insert after the resources insert ensures it is
not lost.

Also:
- Add Vector namespace combined test to verify schema_url survives
  alongside resource attributes
- Reformat changelog to 80-100 char lines
…nt passing

Replace the repeated 5-argument pattern (resource, scope, metric_name,
scope_schema_url, resource_schema_url) across all convert_* methods
with a single MetricContext struct. This also removes the per-function
clone boilerplate since ctx is moved into each closure directly.
szibis added 3 commits March 15, 2026 15:41
Per OTLP spec, time_unix_nano == 0 means the timestamp is unset/unknown.
Previously all 5 metric types (Sum, Gauge, Histogram, ExponentialHistogram,
Summary) converted 0 to Some(epoch), which is semantically incorrect.
Now returns None when time_unix_nano is 0, consistent with the existing
log decode behavior.
Move resource_schema_url and resource_dropped_attributes_count to
their own metadata paths instead of nesting them under the "resources"
namespace which holds user-supplied resource attributes.

For logs (Vector namespace): metadata now stored at flat paths like
%opentelemetry.resource_schema_url instead of
%opentelemetry.resources.schema_url, preventing collision when users
have resource attributes named "schema_url" or
"dropped_attributes_count".

For metrics: metadata tags now use underscore-separated names
(resource_schema_url, scope_dropped_attributes_count) instead of
dot-separated (resource.schema_url, scope.dropped_attributes_count)
to avoid colliding with user attribute tags that follow the
"resource.{key}" / "scope.{key}" format.

Also simplifies test section comment separators per review feedback.
…e tests

Per OTLP spec, time_unix_nano == 0 means "unset". Previously spans
decoded 0 as epoch (1970-01-01T00:00:00Z). This adds a nanos_to_value()
helper that returns Value::Null for 0 and safely handles u64→i64
overflow (year 2262+).

Adds 15 new tests covering:
- Zero timestamp decode (span start/end + span events)
- u64::MAX overflow protection
- All span kinds (0, 3, 4, 5)
- Multiple events and links per span
- Multiple spans per scope
- Status variants (unset, error, missing)
- Trace state preservation
- Unicode span names
- Invalid start > end timestamps
@szibis
Copy link
Contributor Author

szibis commented Mar 15, 2026

#24905 (OTLP decode refactor)

#24621 (OTLP encode logs/traces)

#24897 (OTLP encode metrics)

Each PR rebases on top of the previous one after merge.

@pront after each merge I will do the cleanup+rebase to make this flow easiest as possible for Vector Team.

@szibis szibis force-pushed the fix/otlp-decode-missing-fields branch from 0c07143 to cdb8c90 Compare March 15, 2026 14:41
szibis added 2 commits March 15, 2026 16:35
Stash start_time_unix_nano from OTLP data points into metric metadata
at %vector.otlp.start_time_unix_nano during decode. This enables the
encode path to restore the original value on roundtrip instead of
hardcoding 0. Only non-zero values are stored (zero means "not set"
in OTLP).

All 5 metric types updated: Sum, Gauge, Histogram, ExpHistogram, Summary.
…idecar

Preserve original OTLP attribute types (IntValue, BoolValue, DoubleValue)
during metric decode by storing them as VRL Values in EventMetadata at
%vector.otlp.metric_sidecar.

The sidecar includes resource/scope/data-point attributes as typed VRL
objects, scope metadata fields, and a tags fingerprint for staleness
detection on the encode side.
szibis added a commit to szibis/vector that referenced this pull request Mar 15, 2026
Read the typed attribute sidecar from %vector.otlp.metric_sidecar
(stashed by PR vectordotdev#24905 during decode) and emit original OTLP types
(IntValue, BoolValue, DoubleValue) instead of StringValue.

Fingerprint-based staleness detection ensures correctness: if tags
were mutated by transforms, the sidecar is ignored and the encoder
falls back to string-based tag decomposition.
szibis added 2 commits March 15, 2026 18:20
Replace kv_list_into_value() with pb_value_to_typed_value() in
build_otlp_sidecar_data(). Each attribute value is now stored as a
single-key Object named after the OTLP variant (e.g. {"int_value": 42},
{"string_value": "x"}, {"bytes_value": <bytes>}). This preserves the
StringValue/BytesValue distinction and handles ArrayValue/KvlistValue
recursively, so the encoder can reconstruct the exact protobuf variant.
…code tests

Verify the kind-wrapper approach correctly stores BytesValue (distinct
from StringValue), ArrayValue (with recursively wrapped elements), and
KvlistValue (nested key-value structure) in the metric sidecar.
@pront
Copy link
Member

pront commented Mar 16, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4889ee8f19

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if start_time_unix_nano > 0 {
metric.metadata_mut().value_mut().insert(
path!("vector", "otlp", "start_time_unix_nano"),
Value::Integer(start_time_unix_nano as i64),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve full fixed64 start_time_unix_nano range

start_time_unix_nano is an OTLP fixed64, but this code stores it as Value::Integer(start_time_unix_nano as i64), which wraps values above i64::MAX into negative numbers. In those inputs (e.g., far-future but still valid fixed64 timestamps), the decoded metadata no longer contains the original start time, so downstream consumers of %vector.otlp.start_time_unix_nano will read a corrupted value instead of the source timestamp.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

@szibis szibis Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The as i64 cast is safe here - i64::MAX covers timestamps through year 2262. The bit pattern also round-trips correctly since the encode side casts back to u64. This matches chrono's timestamp_nanos(i64) convention used throughout Vector

@pront
Copy link
Member

pront commented Mar 16, 2026

Checks are failing. I recommend adding the following to your local dev env: https://github.com/vectordotdev/vector/blob/master/CONTRIBUTING.md?plain=1#L119-L160

… expectations

- Run cargo fmt to fix formatting diffs across logs.rs, metrics.rs, spans.rs
- Add authors field to changelog fragment
- Update 9 source tests to account for new sidecar metadata and schema_url fields
- Apply ======== separator style to test section headers per review feedback
@github-actions github-actions bot added the domain: sources Anything related to the Vector's sources label Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: external docs Anything related to Vector's external, public documentation domain: opentelemetry domain: sources Anything related to the Vector's sources

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenTelemetry source: trace decode drops scope, schema_url fields

3 participants