Skip to content

feat(datadog_metrics sink): switch series v2 endpoint to zstd compression#24956

Draft
vladimir-dd wants to merge 3 commits intomasterfrom
vladimir-dd/metrics-v2-zstd
Draft

feat(datadog_metrics sink): switch series v2 endpoint to zstd compression#24956
vladimir-dd wants to merge 3 commits intomasterfrom
vladimir-dd/metrics-v2-zstd

Conversation

@vladimir-dd
Copy link
Contributor

@vladimir-dd vladimir-dd commented Mar 18, 2026

Summary

Rationale: Switch Series v2 (/api/v2/series) to zstd compression.

  • Add DatadogMetricsCompression enum (Zlib/Zstd) in config.rs with compressor(), content_encoding(), and max_compressed_size() methods
  • Add compression() method on DatadogMetricsEndpoint: Series v2 → Zstd, Series v1 and Sketches → Zlib
  • Add max_compressed_size(n) for each scheme: Zlib uses the DEFLATE stored-block worst-case formula; Zstd mirrors the ZSTD_compressBound C macro
  • Propagate content_encoding through DatadogMetricsRequest and the request builder instead of hardcoding "deflate"
  • Make DatadogMetricsEncoder::new() infallible — production limits from payload_limits() are always valid; remove CreateError and validate_payload_size_limits
  • Move with_payload_limits() to #[cfg(test)]; fix reset_state() to create the correct compressor for the endpoint on each reset
  • Fix max_uncompressed_header_len to accept an endpoint argument — only Series v1 has a JSON envelope; all other endpoints return 0

Tests added:

  • max_compressed_size_is_upper_bound: empirically validates both Zlib and Zstd formulas are true upper bounds using incompressible (Xorshift64) data, and are not overly conservative (slack ≤ 1% + 64 bytes)
  • encode_series_v2_breaks_out_when_limit_reached_compressed: verifies the hot-path compressed-limit check works correctly for the zstd path
  • encoding_check_for_payload_limit_edge_cases_v2: proptest that any Series v2 payload decompresses cleanly with zstd and stays within configured limits
  • Renamed encoding_check_for_payload_limit_edge_casesencoding_check_for_payload_limit_edge_cases_v1 to make the scope explicit

Correctness verification:

  • Zlib path (V1, Sketches): unchanged — compression() returns Zlib, wired to zlib_default() compressor and "deflate" header, same as before
  • Limit check equivalence: old compressed_written + n + block_overhead(n) > limit equals new compressed_len + max_compressed_size(n) > limit since max_compressed_size(n) = n + block_overhead(n)
  • validate_payload_size_limits removal is safe: finish() remains the final safeguard; production limits are always valid

Vector configuration

sinks:
  datadog_metrics:
    type: datadog_metrics
    inputs: [...]
    default_api_key: "${DD_API_KEY}"
    series_api_version: v2  # now correctly uses zstd

How did you test this PR?

  • Unit tests: all 51 datadog metrics tests pass (cargo test --no-default-features --features sinks-datadog_metrics)
  • max_compressed_size_is_upper_bound empirically confirms both Zlib and Zstd bound formulas at stored-block boundaries (16 KB for zlib, 128 KB for zstd) and at various sizes up to 500 KB
  • Proptest encoding_check_for_payload_limit_edge_cases_v2 fuzzes compressed and uncompressed limits for the V2/zstd path
  • make check-clippy passes with no warnings

Change Type

  • New feature

Is this a breaking change?

  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.

@github-actions github-actions bot added the domain: sinks Anything related to the Vector's sinks label Mar 18, 2026
@vladimir-dd vladimir-dd force-pushed the vladimir-dd/metrics-v2-zstd branch 13 times, most recently from 5924ee2 to c4c80b6 Compare March 18, 2026 19:22
- Add `DatadogMetricsCompression` enum (Zlib/Zstd) to `config.rs` with a
  `compression()` method on `DatadogMetricsEndpoint`; Series v2 maps to Zstd,
  v1 and Sketches map to Zlib
- Series v2 (/api/v2/series) now uses zstd; v1 and Sketches continue using zlib
- Remove hardcoded "deflate" Content-Encoding; propagate `content_encoding`
  through `DatadogMetricsRequest` and `DatadogMetricsRequestBuilder`
- Make `DatadogMetricsEncoder::new` infallible; move `with_payload_limits`
  to a test-only impl block; remove `CreateError` and `validate_payload_size_limits`
- Fix `max_uncompressed_header_len` to take an endpoint parameter (only Series v1
  has a JSON envelope; v2 and Sketches use protobuf with no envelope)
- Fix `try_compress_buffer` worst-case estimate to use `max_compressed_size`,
  which dispatches between the deflate stored-block formula and the
  ZSTD_compressBound formula
- Add proptests for both V1 (zlib) and V2 (zstd) encoding paths, with ranges
  proportional to each endpoint's real API limits

Rationale: Series v2 uses protobuf + zstd compression while v1 and Sketches use
zlib. The previous code hardcoded "deflate" for all endpoints. The new
DatadogMetricsCompression enum makes the compression scheme a first-class property
derived from the endpoint, ensuring Content-Encoding always matches the compressor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vladimir-dd vladimir-dd force-pushed the vladimir-dd/metrics-v2-zstd branch from c4c80b6 to fa052b6 Compare March 18, 2026 19:28
@vladimir-dd vladimir-dd changed the title feat(datadog_metrics sink): add zstd compression for series v2 endpoint feat(datadog_metrics sink): switch series v2 endpoint to zstd compression Mar 18, 2026
- Add changelog fragment for zstd compression on Series v2 endpoint
- Remove misplaced doc comment that had landed above generate_series_metrics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vladimir-dd
Copy link
Contributor Author

@codex review

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76fb1c59bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

impl DatadogMetricsCompression {
pub(super) const fn content_encoding(self) -> &'static str {
match self {
Self::Zstd => "zstd",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Send Datadog v2 metrics with zstd1 encoding token

For Series v2, content_encoding() now emits "zstd", but Datadog’s generated Metrics v2 client docs for submit_metrics list allowed content_encoding values as deflate, zstd1, and gzip (see MetricsAPI#submit_metrics allowable values at https://datadoghq.dev/datadog-api-client-ruby/DatadogAPIClient/V2/MetricsAPI.html). If the intake strictly validates this enum, all v2 requests from this change will carry an invalid Content-Encoding header and be rejected with 4xx responses, causing metric loss.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: sinks Anything related to the Vector's sinks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant