Skip to content

[opentelemetry source] Update mapping of OpenTelemetry log data model to LogEvent #15500

@spencergilbert

Description

@spencergilbert

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

cc @jszwedko

Per spec, the OpenTelemetry log model is as follows:

Here is the list of fields in a log record:

Field Name Description
Timestamp Time when the event occurred.
ObservedTimestamp Time when the event was observed.
TraceId Request trace id.
SpanId Request span id.
TraceFlags W3C trace flag.
SeverityText The severity text (also known as log level).
SeverityNumber Numerical value of the severity.
Body The body of the log record.
Resource Describes the source of the log.
InstrumentationScope Describes the scope that emitted the log.
Attributes Additional information about the event.

This doesn't completely account for the available fields available when decoding OTLP encoded log messages, which look like (in a pseudo JSON/rust representation):

ResourceLogs {
  resource: Option<{
    // Note we want to treat this like a `Value::Object` with `String` keys
    // and any `Value` as the value, rather than an `Array` of key values.
    attributes: [
      String, value::Value
    ],
    dropped_attributes_count: u32
  }>,
  scope_logs: [
    {
      scope: Option<{
        name: String,
        version: String,
        attributes: [
          String, value::Value
        ],
        dropped_attributes_count: u32,
      }>,
      log_records: [
        {
          time_unix_nano: u64,
          observed_time_unix_nano: u64,
          severity_number: Option<i32>,
          severity_text: Option<String>,
          body: Option<value::Value>,
          attributes: Option<[
            String, value::Value
          ]>,
          dropped_attributes_count: u32,
          flags: Option<u32>,
          // We `hex::encode` the following two fields
          trace_id: Option<Vec<u8>>,
          span_id: Option<Vec<u8>>,
        }
      ],
      schema_url: String
    }
  ],
  // This "schema_url" applies to the data in the "resource" field.
  schema_url: String
}

As of v0.26.0 of Vector we map the following fields, Legacy log namespace and default log schema:

OpenTelemetry Log Vector LogEvent
resource.attributes .resources
scope_logs.log_records.[].attributes .attributes
scope_logs.log_records.[].body .message
scope_logs.log_records.[].trace_id .trace_id
scope_logs.log_records.[].span_id .span_id
scope_logs.log_records.[].severity_text .severity_text
scope_logs.log_records.[].severity_number .severity_number
scope_logs.log_records.[].flags .flags
scope_logs.log_records.[].observed_time_unix_nano .observed_timestamp
scope_logs.log_records.[].time_unix_nano .timestamp
scope_logs.log_records.[].dropped_attributes_count .dropped_attributes_count

Attempted Solutions

For fields we do insert from the decoded protobuf, a user can remap fields to their liking - but we drop a number of the fields the OTLP payload includes.

Proposal

LogNamespace::Vector

OpenTelemetry Log Vector LogEvent
resource.attributes %opentelemetry.resource.attributes
resource.dropped_attributes_count %opentelemetry.resource.dropped_attributes_count
schema_url %opentelemetry.resource.schema_url
scope_logs.scope.name %opentelemetry.scope.name
scope_logs.scope.version %opentelemetry.scope.version
scope_logs.scope.attributes %opentelemetry.scope.attributes
scope_logs.scope.dropped_attributes_count %opentelemetry.scope.dropped_attributes_count
scope_logs.schema_url %opentelemetry.scope.schema_url
scope_logs.log_records.[].time_unix_nano %opentelemetry.time
scope_logs.log_records.[].observed_time_unix_nano %opentelemetry.observed_time
scope_logs.log_records.[].severity_number %opentelemetry.severity_number
scope_logs.log_records.[].severity_text %opentelemetry.severity_text
scope_logs.log_records.[].body .
scope_logs.log_records.[].attributes %opentelemetry.attributes
scope_logs.log_records.[].dropped_attributes_count %opentelemetry.dropped_attributes_count
scope_logs.log_records.[].flags %opentelemetry.flags
scope_logs.log_records.[].trace_id %opentelemetry.trace_id
scope_logs.log_records.[].span_id %opentelemetry.span_id

An outstanding question I have is the *time_unix_nano fields, I believe we want to convert them from unix nano into Value::Timestamp otherwise operations in remap will be tedious, but there could be arguments for keeping this "as-is".

We could backport some of these fields into the LogNamespace::Legacy events, the main point of conflict are the subfields of the resource object. We could breaking change Legacy and make .resource(s) an object holding the attributes, dropped_attributes_count, and schema_url - have a more awkward naming of .resources, .resource_schema_url, .resource_dropped_attributes_count - or even a middle ground of .resources and .resource.attributes being either duplicated or part of the source config to choose where they're written (to allow for non-breaking upgrades).

References

No response

Version

vector 0.26.0 (x86_64-apple-darwin c6b5bc2 2022-12-05)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions