Skip to content

Conversation

@nina9753
Copy link

@nina9753 nina9753 commented Oct 29, 2025

What does this PR do?

This PR implements comprehensive batch message handling for GCP Pub/Sub, addressing a critical challenge: the Pub/Sub client library automatically batches messages from multiple independent requests, causing async context loss. GCP Pub/Sub has transparent internal batching that happens on the publisher side and internally inside the topic.

When multiple HTTP requests each call topic.publishMessage(), they return immediately to the user. But behind the scenes, the Pub/Sub client queues these messages and N seconds later makes ONE batched gRPC call with ALL the messages together. So by the time the batched gRPC call happens, the original HTTP requests have completed and their async contexts are gone. We're left with either no active span or the wrong span active, leading to miss-parented spans. If we only instrumented at the gRPC level, we'd have no way to determine which trace each message came from.

To handle this we introduced dual-wrapping:
High-Level Wrapping (Topic.publishMessage / Topic.publish):

  • Wraps the user-facing Topic API that developers call
  • Captures trace context immediately when the user calls the API (while their request span is still active)
  • Injects this context into message attributes before batching occurs
  • Similar to how we handle message.ack() - capture context early before batching

Low-Level Wrapping (PublisherClient.publish):

  • Wraps the internal gRPC client call that actually sends to Pub/Sub
  • Creates the pubsub.request span representing the network operation
  • Reads trace contexts from message attributes (stored by high-level wrap)
  • Implements span linking when multiple traces are batched together

Producer Plugin Enhancements

  • The producer plugin now handles batched messages intelligently:
  • Extracts the parent context from the first message's attributes
    - Creates span links to all other messages in the batch (messages 2-N)
  • Injects batch metadata into all messages: batch size, position, and batch span IDs
  • Adds proper span link metadata for APM UI visualization
  • Supports 128-bit trace IDs for proper linking

Next PR in the batch is #6415

Motivation

Plugin Checklist

Additional Notes

An inferred span for the push subscription HTTP POST request to the Cloud Run service from a pub/sub topic
Example full Push Distributed Trace of a cloud run service triggering another service using a push subscription

image image

@github-actions
Copy link

github-actions bot commented Oct 29, 2025

Overall package size

Self size: 13.43 MB
Deduped: 113.62 MB
No deduping: 128.64 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.3.0 | 20.73 MB | 20.74 MB | | @datadog/pprof | 5.12.0 | 11.19 MB | 11.57 MB | | @datadog/native-iast-taint-tracking | 4.1.0 | 9.01 MB | 9.02 MB | | @opentelemetry/resources | 1.30.1 | 557.67 kB | 7.71 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.4 | 2.95 MB | 5.83 MB | | @datadog/wasm-js-rewriter | 5.0.1 | 2.82 MB | 3.53 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api-logs | 0.208.0 | 199.48 kB | 1.42 MB | | @opentelemetry/api | 1.9.0 | 1.22 MB | 1.22 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.15.0 | 127.66 kB | 856.24 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | @datadog/openfeature-node-server | 0.2.0 | 118.51 kB | 437.19 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.2.1 | 163.06 kB | 163.06 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | @isaacs/ttlcache | 2.1.2 | 90.79 kB | 90.79 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB | | escape-string-regexp | 5.0.0 | 3.66 kB | 3.66 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@pr-commenter
Copy link

pr-commenter bot commented Oct 29, 2025

Benchmarks

Benchmark execution time: 2025-12-01 20:46:02

Comparing candidate commit f7af73c in PR branch nina.rei/SVLS-7168/gcp-pubsub-batch-plugin with baseline commit 80b7ddf in branch nina.rei/SVLS-7168/gcp-push-pubsub-plugin.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 289 metrics, 31 unstable metrics.

@codecov
Copy link

codecov bot commented Oct 29, 2025

Codecov Report

❌ Patch coverage is 26.78571% with 82 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.22%. Comparing base (80b7ddf) to head (f7af73c).

Files with missing lines Patch % Lines
...atadog-instrumentations/src/google-cloud-pubsub.js 6.12% 46 Missing ⚠️
...datadog-plugin-google-cloud-pubsub/src/producer.js 55.00% 18 Missing ⚠️
...oogle-cloud-pubsub/src/pubsub-push-subscription.js 0.00% 18 Missing ⚠️
Additional details and impacted files
@@                              Coverage Diff                              @@
##           nina.rei/SVLS-7168/gcp-push-pubsub-plugin    #6782      +/-   ##
=============================================================================
- Coverage                                      84.44%   84.22%   -0.23%     
=============================================================================
  Files                                            507      513       +6     
  Lines                                          21768    21895     +127     
=============================================================================
+ Hits                                           18381    18440      +59     
- Misses                                          3387     3455      +68     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nina9753 nina9753 force-pushed the nina.rei/SVLS-7168/gcp-pubsub-batch-plugin branch from d7f0d5e to 3fdb180 Compare October 29, 2025 20:48
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Oct 29, 2025

⚠️ Tests

⚠️ Warnings

❄️ 1 New flaky test detected

cypress@14.5.4 commonJS instruments tests with the APM protocol (old agents) from integration-tests/cypress/cypress.spec.js (Datadog)
expected 'fail' to equal 'pass'

🧪 3 Tests failed

express-mongo-sanitize with express-mongo-sanitize >=1.0.0 (1.0.0) middleware without subscriptions it continues working with sanitization request from without subscriptions (Datadog)
Expected values to be strictly equal:
+ actual - expected

+ 'paramvalue'
- undefined

AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
+ actual - expected

+ 'paramvalue'
...
express-mongo-sanitize with express-mongo-sanitize >=1.0.0 (1.0.0) middleware with subscriptions it continues working with sanitization request from with subscriptions (Datadog)
Expected values to be strictly equal:
+ actual - expected

+ 'paramvalue'
- undefined

AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
+ actual - expected

+ 'paramvalue'
...
express-mongo-sanitize with express-mongo-sanitize >=1.0.0 (1.0.0) middleware with subscriptions subscription is called with expected parameters with sanitization request from with subscriptions (Datadog)
Expected values to be strictly equal:
+ actual - expected

+ 'paramvalue'
- undefined

AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
+ actual - expected

+ 'paramvalue'
...
This comment will be updated automatically if new data arrives.
🔗 Commit SHA: f7af73c | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@nina9753 nina9753 force-pushed the nina.rei/SVLS-7168/gcp-pubsub-batch-plugin branch 2 times, most recently from d76ed3b to f638388 Compare November 4, 2025 21:44
@nina9753 nina9753 force-pushed the nina.rei/SVLS-7168/gcp-pubsub-batch-plugin branch 2 times, most recently from 8e3014c to 10eb97c Compare November 18, 2025 17:42
@nina9753 nina9753 marked this pull request as ready for review November 19, 2025 18:22
@nina9753 nina9753 requested review from a team as code owners November 19, 2025 18:22
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@nina9753 nina9753 changed the title [SVLS-7168] Create GCP PubSub Batch Publish Plugin [SVLS-7168]Support GCP PubSub Batch Publish Nov 19, 2025
@nina9753 nina9753 force-pushed the nina.rei/SVLS-7168/gcp-push-pubsub-plugin branch from 7862105 to 505588b Compare December 1, 2025 15:19
@nina9753 nina9753 requested review from a team as code owners December 1, 2025 15:19
@nina9753 nina9753 requested review from BridgeAR and removed request for a team December 1, 2025 15:19
@nina9753 nina9753 force-pushed the nina.rei/SVLS-7168/gcp-pubsub-batch-plugin branch from a67bc15 to ed3cc6d Compare December 1, 2025 15:27
- Collect span links from messages 2-N (first becomes parent)
- Extract parent context from first message trace context
- Create pubsub.request span with span links metadata
- Inject batch metadata into all messages (_dd.pubsub_request.*, _dd.batch.*)
- Add 128-bit trace ID support (_dd.p.tid)
- Add operation tag for batched vs single requests
- Add ack context map to preserve trace context across batched acknowledges
- Update producer to use batchSpan._startTime for accurate publish time
- Add explicit parent span support in client plugin
- Wrap Message.ack() to store context before batched gRPC acknowledge
- Update Subscription.emit to properly handle storage context
- Sync auto-load improvements from Branch 1
Ensures pubsub.delivery spans are properly tagged with the correct
integration name for test validation and proper categorization.
@nina9753 nina9753 force-pushed the nina.rei/SVLS-7168/gcp-pubsub-batch-plugin branch from ccc9ae5 to f7af73c Compare December 1, 2025 20:37
addHook({ name: '@google-cloud/pubsub', versions: ['>=1.2'], file: 'build/src/subscriber.js' }, (obj) => {
const Message = obj.Message

if (Message && Message.prototype && Message.prototype.ack) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Message && Message.prototype && Message.prototype.ack) {
if (Message?.prototype?.ack) {

return ctx.parentStore
}

_extractSpanLink (attrs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants