Skip to content

Conversation

@SomilJain0112
Copy link
Contributor

@SomilJain0112 SomilJain0112 commented Oct 11, 2025

Which problem is this PR solving?

#6467

Description of the changes

HTTP Response Format Change (Breaking Change):
The /api/v3/traces HTTP endpoints now return a JSON array to support true streaming while maintaining valid JSON format:

Before:

{
  "result": {
    "resourceSpans": [...]
  }
}

After:

[
  {
    "result": {
      "resourceSpans": [...]
    }
  }
]

Implementation Details :

  • Falls back to buffered response if streaming not available
  • Writes JSON array incrementally: [ → trace → , → trace → ]
  • Flushes after each trace for immediate network delivery

How was this change tested?

Tested locally testing and added testcases as well

Checklist

@SomilJain0112 SomilJain0112 requested a review from a team as a code owner October 11, 2025 20:09
@dosubot dosubot bot added the enhancement label Oct 11, 2025
Comment on lines +179 to +181
if len(traces) == 0 {
return true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current handling of empty trace arrays may lead to ambiguity in the API response. When all iterations return empty arrays, tracesFound remains false and a 404 error is returned. This approach doesn't distinguish between "no traces exist for this query" and "traces exist but contain no data." Consider adding a flag that indicates whether the query matched any traces at all, separate from whether those traces contain spans. This would provide more accurate feedback to API consumers about whether their query parameters matched anything in the system.

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a reasonable callout. Can we add a test for this use case?

@codecov
Copy link

codecov bot commented Oct 11, 2025

Codecov Report

❌ Patch coverage is 76.81159% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.50%. Comparing base (528476f) to head (bf9c4a6).

Files with missing lines Patch % Lines
cmd/query/app/apiv3/http_gateway.go 76.81% 12 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7555      +/-   ##
==========================================
- Coverage   96.59%   96.50%   -0.10%     
==========================================
  Files         384      384              
  Lines       19404    19469      +65     
==========================================
+ Hits        18744    18789      +45     
- Misses        477      492      +15     
- Partials      183      188       +5     
Flag Coverage Δ
badger_v1 8.83% <ø> (ø)
badger_v2 1.74% <ø> (ø)
cassandra-4.x-v1-manual 12.56% <ø> (ø)
cassandra-4.x-v2-auto 1.73% <ø> (ø)
cassandra-4.x-v2-manual 1.73% <ø> (ø)
cassandra-5.x-v1-manual 12.56% <ø> (ø)
cassandra-5.x-v2-auto 1.73% <ø> (ø)
cassandra-5.x-v2-manual 1.73% <ø> (ø)
clickhouse 1.66% <ø> (ø)
elasticsearch-6.x-v1 16.76% <ø> (ø)
elasticsearch-7.x-v1 16.79% <ø> (ø)
elasticsearch-8.x-v1 16.94% <ø> (ø)
elasticsearch-8.x-v2 1.74% <ø> (ø)
elasticsearch-9.x-v2 1.74% <ø> (ø)
grpc_v1 10.76% <ø> (ø)
grpc_v2 1.74% <ø> (ø)
kafka-3.x-v1 10.26% <ø> (ø)
kafka-3.x-v2 1.74% <ø> (ø)
memory_v2 1.74% <ø> (ø)
opensearch-1.x-v1 16.84% <ø> (ø)
opensearch-2.x-v1 16.84% <ø> (ø)
opensearch-2.x-v2 1.74% <ø> (ø)
opensearch-3.x-v2 1.74% <ø> (ø)
query 1.74% <ø> (ø)
tailsampling-processor 0.50% <ø> (ø)
unittests 95.41% <76.81%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link

github-actions bot commented Oct 11, 2025

Metrics Comparison Summary

Total changes across all snapshots: 53

Detailed changes per snapshot

summary_metrics_snapshot_cassandra

📊 Metrics Diff Summary

Total Changes: 53

  • 🆕 Added: 53 metrics
  • ❌ Removed: 0 metrics
  • 🔄 Modified: 0 metrics

🆕 Added Metrics

  • http_server_request_body_size_bytes (18 variants)
View diff sample
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
...
- `http_server_request_duration_seconds` (17 variants)
View diff sample
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.005",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.01",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.025",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.05",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.075",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.1",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
...
- `http_server_response_body_size_bytes` (18 variants)
View diff sample
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
...

➡️ View full metrics file

@yurishkuro
Copy link
Member

there was a previous attempt to fix this issue and it was blocked. How is your approach different from that one?

Signed-off-by: Somil Jain <somiljain896@gmail.com>
@SomilJain0112
Copy link
Contributor Author

Hii @yurishkuro ,
Let me explain whats different from #6479 .

  1. I have used flush instead of using manual transfer. Go's http server handles this automatically.
  2. Gzip buffered internally defeating the point of streaming, instead I have used "Content-Encoding", "identity".
  3. In [WIP] Implement proper payload chunked encoding in HTTP api_v3 #6479 concatenation being done which was invalid json, instead I have used NDJSON(newline separated).

Test coverage is 83.87 percent, I don't think it can be increased further please correct me If I am wrong.

@SomilJain0112
Copy link
Contributor Author

Hii @yurishkuro , Review this PR as well please!


w.Header().Set("Content-Type", "application/json")
w.Header().Set("X-Content-Type-Options", "nosniff")
w.Header().Set("Content-Encoding", "identity")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "identity" encoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sending data immediately without compression that's why using identity header.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I am not mistaken we have a compression handler defined at the server level. How would that work with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No compression middleware exists for API endpoints
-> Verified in cmd/query/app/server.go (lines 188-196) - handler chain has no compression


marshaler := jsonpb.Marshaler{}
if err := marshaler.Marshal(w, response); err != nil {
h.Logger.Error("Failed to marshal trace chunk", zap.Error(err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens to the output stream if we just log error and exit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before streaming if error occur then it will throw error with 500 code and if during streaming error occur then client will receive incomplete data and something like "Connection closes unexpectedly" and a log will also be logged in server logs.

return false
}

flusher.Flush()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand correctly you serialize each chunk and flush it to the client. What happens to content-length header in this case? And how does the client know that there are multiple chunks to be received?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So http client by default will read it till EOF, no content length need to know by client as it is handled automatically.

yield([]ptrace.Traces{trace2}, nil)
})).Once()

r, err := http.NewRequest(http.MethodGet, "/api/v3/traces/1", http.NoBody)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate what and how you're testing here. All asserts seem to happen against the recorder w, but what about the client side? How is the client supposed to handle chunked response? If the only thing you're doing is just avoiding writing a single huge payload from the server to a client connection, then yes it protects the server against keeping too much state in memory, but it doesn't really help the client to precess the results in a streaming fashion, it still needs to read the whole cumulative payload.

Also, what happens to the adjusters in the query service? I suspect we still have to load the complete trace into memory to adjust it. That doesn't mean there's no benefit to streaming - we can at least chunk up the stream on individual traces rather than on ALL traces in the response coming as a single payload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client is still getting chunk by chunk bcoz we are streaming and adjusters also doing per trace in the iterator,
that is why queryService v2 uses iter.Seq2 which is also designed for streaming.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if even chunk is fully formed JSON then the combined result will not be valid JSON. The tests need to include client side to showcase how it is meant to handle the response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hii @yurishkuro , I made the relevant changes in the test file that you mentioned about, please have a look on it!
Thanks!

@SomilJain0112
Copy link
Contributor Author

Hii @yurishkuro ,
I have addressed all the questions above, and thanks for such an insightful review. Please let me know if I’ve missed anything or if any changes are still needed.
Again thanks!

Signed-off-by: Somil Jain <somiljain896@gmail.com>
Signed-off-by: Somil Jain <somiljain896@gmail.com>
Signed-off-by: Somil Jain <somiljain896@gmail.com>
@SomilJain0112 SomilJain0112 force-pushed the feat/implement-chunk-encoding-in-http-api-v3 branch from 97748d2 to 6bcf029 Compare October 18, 2025 07:23
@SomilJain0112
Copy link
Contributor Author

Hii @yurishkuro , I had resolved the comments. please review this.

@SomilJain0112
Copy link
Contributor Author

Hii @yurishkuro ,
You had made 2 more comments.

1st comment -> "this is cheating. The client is receiving JSON and is expected to parse it. The way you implemented this in the server is you're sending multiple fully formed JSON chunks, which makes the overall payload invalid JSON. Try adding json.parse of the body here."

Fix -> Made changes so that client will get overall valid json and in testcase also validated the same 
2nd comment -> "if I am not mistaken we have a compression handler defined at the server level. How would that work with this?"

Fix -> No compression middleware exists for API endpoints, Verified in cmd/query/app/server.go (lines 188-196) - handler chain has no compression

Addressed both comments, Kindly review those changes.
Please let me know if I am going wrong anywhere.
Thanks!

Signed-off-by: Somil Jain <somiljain896@gmail.com>
@SomilJain0112 SomilJain0112 force-pushed the feat/implement-chunk-encoding-in-http-api-v3 branch from 3f0dd6b to a37de9d Compare October 20, 2025 13:32
]
}
]
[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you cannot change how the existing API works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Even I was concerned about this, but what could be alternate way to do this I can't think of. Can you guide me please? @yurishkuro

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is overall the right approach. All you're doing is using a lower-level TCP stream and allowing the server to write a single large response in smaller chunks. Some very advanced client with online JSON parser might be able to use it, but for most clients there's no benefit, they still have to load all chunks as a single payload before parsing it.

You are not using chunked encoding feature of HTTP protocol that at least informs the client of the logical message boundaries. Nor are you following the grpc-web protocol in how it supports the streaming APIs.

@SomilJain0112
Copy link
Contributor Author

I don't think this is overall the right approach. All you're doing is using a lower-level TCP stream and allowing the server to write a single large response in smaller chunks. Some very advanced client with online JSON parser might be able to use it, but for most clients there's no benefit, they still have to load all chunks as a single payload before parsing it.

You are not using chunked encoding feature of HTTP protocol that at least informs the client of the logical message boundaries. Nor are you following the grpc-web protocol in how it supports the streaming APIs.

@yurishkuro Thank you for the feedback. I understand the current approach doesn't provide real streaming benefits.

Could you help me understand the best path forward?

  1. Should I implement newline-delimited JSON where each trace is a separate JSON object on its own, which will also be a breaking change
  2. Should I follow the grpc-web framing protocol for proper message boundaries, which will also be a breaking change?

Or something else ? what do you suggest?

@yurishkuro
Copy link
Member

I would investigate grpc-web approach first. This /api/v3 was originally built with grpc-web framework, but later we removed that dependency and reimplemented manually. It doesn't mean we can't still support the same capabilities. In particular, in idl/proto/api_v3/query_service.proto we have this GRPCGatewayWrapper type specifically reserved for multi-part responses, we just need to understand / match it with how grpc-web would send them.

Signed-off-by: Somil Jain <somiljain896@gmail.com>
@SomilJain0112 SomilJain0112 force-pushed the feat/implement-chunk-encoding-in-http-api-v3 branch from 9a91a07 to b6a6b0e Compare October 23, 2025 17:24
@SomilJain0112
Copy link
Contributor Author

Hii @yurishkuro ,
My latest implementation for streaming take care of backward compatibility as well as streaming and address all of your comments.
Kindly review changes.
Thanks!

cp "./cmd/query/query-${PLATFORM}" "${PACKAGE_STAGING_DIR}/jaeger-query${FILE_EXTENSION}"
cp "./cmd/collector/collector-${PLATFORM}" "${PACKAGE_STAGING_DIR}/jaeger-collector${FILE_EXTENSION}"
cp "./cmd/ingester/ingester-${PLATFORM}" "${PACKAGE_STAGING_DIR}/jaeger-ingester${FILE_EXTENSION}"
cp "./examples/hotrod/hotrod-${PLATFORM}" "${PACKAGE_STAGING_DIR}/example-hotrod${FILE_EXTENSION}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR

// We should have multiple trace objects
assert.GreaterOrEqual(t, nonEmptyLines, 1, "Should have at least 1 trace")
assert.Contains(t, body, "foobar") // First trace span name
assert.Contains(t, body, "second-span") // Second trace span name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test needs to demonstrate how a real client would consume the data. They are not going to search for substrings, they are going to parse each chunk.

}

w.Header().Set("Content-Type", "application/json")
w.Header().Set("X-Content-Type-Options", "nosniff")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? add comment


w.Header().Set("Content-Type", "application/json")
w.Header().Set("X-Content-Type-Options", "nosniff")
w.Header().Set("Transfer-Encoding", "chunked")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not going to be backwards compatible. We need to introduce a feature flag that would enable returning chunked encoding instead of single payload. Or make it a request parameter (which is worse since the API is defined by gRPC service, which doesn't need such parameter). So I would go with a feature gate.

@SomilJain0112
Copy link
Contributor Author

Heyy @yurishkuro ,
That idea didn't crossed my mind to introduce featureflag to ensure backward compatibiltiy.
I have added featureflag and used parseNDJSON in testcases just like a real client should do.
Kindly review , Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants