Skip to content

Conversation

@drewelliott
Copy link
Contributor

Summary

Adds native OpenTelemetry Protocol (OTLP) output support to gNMIc, enabling direct export of telemetry data to OTLP-compatible backends.

  • New OTLP output plugin with direct OTLP/gRPC export
  • Full metric conversion (gauges, counters, histograms) from gNMI to OTLP
  • Support for custom resource attributes and semantic conventions
  • Comprehensive test coverage included

🤖 Generated with Claude Code

@google-cla
Copy link

google-cla bot commented Dec 4, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Implements native OTLP/gRPC output enabling direct gNMIC → OTEL Collector
integration without intermediate components (NATS/Prometheus/Kafka).

Features:
- gRPC transport with TLS support
- Automatic metric type detection (Counter/Gauge based on path heuristics)
- Proper gNMI path → OTLP metric name conversion (slash/hyphen to underscore)
- Configurable subscription name prepending for vendor-specific prefixes
- Event.Values iteration supporting any value key structure
- Comprehensive validation of OTLP message structure
- Response PartialSuccess checking for rejection detection
- Configurable batching with worker pools
- Retry logic with exponential backoff
- Prometheus metrics for observability

Configuration options:
- endpoint: OTLP collector endpoint (required)
- protocol: "grpc" or "http" (default: grpc)
- timeout: Request timeout (default: 10s)
- batch-size: Metrics per batch (default: 1000)
- interval: Max time before sending batch (default: 5s)
- num-workers: Worker pool size (default: 1)
- max-retries: Retry attempts (default: 3)
- append-subscription-name: Prepend subscription name to metrics (default: false)
- strings-as-attributes: Convert string values to gauge with attribute (default: false)
- metric-prefix: Global prefix for all metrics (optional)
- resource-attributes: Static resource attributes (optional)
- tls: TLS configuration (optional)

Example configuration:
outputs:
  otlp:
    type: otlp
    endpoint: otel-collector:4317
    protocol: grpc
    batch-size: 1000
    num-workers: 2
    append-subscription-name: true
    strings-as-attributes: true
    resource-attributes:
      telemetry.source: "gnmi"

Metric naming:
- [prefix_][subscription_]path_with_underscores
- Example: nvos_interfaces_interface_state_counters_in_octets
- Complies with OTLP naming conventions (a-z, 0-9, _, .)
This commit syncs all OTLP output improvements from the pylon-platform
fork to the upstream GitHub repository:

1. Add add-event-tags-as-attributes config option
   - Makes event-tags appear as Prometheus labels in OTLP output
   - Critical for Panoptes integration with standard label names

2. Fix critical event.Values handling
   - Accept ANY value key format from gNMI devices
   - Improves compatibility with diverse network equipment

3. Comprehensive OTLP validation and error handling
   - PartialSuccess response handling
   - Data point validation before export
   - Debug logging for troubleshooting

4. Metric naming improvements
   - Convert path slashes to underscores
   - Prepend subscription name when configured
   - Ensures valid Prometheus metric names

All changes maintain backward compatibility and are production-tested
in NVIDIA Pylon platform with 15-pod gNMIC clusters.
@drewelliott drewelliott force-pushed the drew/sync-otlp-improvements branch from 45e1e97 to ea4f8d7 Compare December 4, 2025 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant