-
Couldn't load subscription status.
- Fork 2.7k
Implemented chunked encoding #7555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implemented chunked encoding #7555
Conversation
| if len(traces) == 0 { | ||
| return true | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current handling of empty trace arrays may lead to ambiguity in the API response. When all iterations return empty arrays, tracesFound remains false and a 404 error is returned. This approach doesn't distinguish between "no traces exist for this query" and "traces exist but contain no data." Consider adding a flag that indicates whether the query matched any traces at all, separate from whether those traces contain spans. This would provide more accurate feedback to API consumers about whether their query parameters matched anything in the system.
Spotted by Graphite Agent
Is this helpful? React 👍 or 👎 to let us know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a reasonable callout. Can we add a test for this use case?
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7555 +/- ##
==========================================
- Coverage 96.59% 96.50% -0.10%
==========================================
Files 384 384
Lines 19404 19469 +65
==========================================
+ Hits 18744 18789 +45
- Misses 477 492 +15
- Partials 183 188 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Metrics Comparison SummaryTotal changes across all snapshots: 53 Detailed changes per snapshotsummary_metrics_snapshot_cassandra📊 Metrics Diff SummaryTotal Changes: 53
🆕 Added Metrics
View diff sample+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
...View diff sample+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.005",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.01",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.025",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.05",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.075",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_request_duration_seconds{http_request_method="GET",http_response_status_code="503",le="0.1",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
...View diff sample+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="+Inf",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="0",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="100",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="1000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="10000",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
+http_server_response_body_size_bytes{http_request_method="GET",http_response_status_code="503",le="25",network_protocol_name="http",network_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_schema_url="",otel_scope_version="0.63.0",server_address="localhost",server_port="13133",url_scheme="http"}
... |
|
there was a previous attempt to fix this issue and it was blocked. How is your approach different from that one? |
Signed-off-by: Somil Jain <somiljain896@gmail.com>
|
Hii @yurishkuro ,
Test coverage is 83.87 percent, I don't think it can be increased further please correct me If I am wrong. |
|
Hii @yurishkuro , Review this PR as well please! |
cmd/query/app/apiv3/http_gateway.go
Outdated
|
|
||
| w.Header().Set("Content-Type", "application/json") | ||
| w.Header().Set("X-Content-Type-Options", "nosniff") | ||
| w.Header().Set("Content-Encoding", "identity") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is "identity" encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sending data immediately without compression that's why using identity header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I am not mistaken we have a compression handler defined at the server level. How would that work with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No compression middleware exists for API endpoints
-> Verified in cmd/query/app/server.go (lines 188-196) - handler chain has no compression
|
|
||
| marshaler := jsonpb.Marshaler{} | ||
| if err := marshaler.Marshal(w, response); err != nil { | ||
| h.Logger.Error("Failed to marshal trace chunk", zap.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens to the output stream if we just log error and exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before streaming if error occur then it will throw error with 500 code and if during streaming error occur then client will receive incomplete data and something like "Connection closes unexpectedly" and a log will also be logged in server logs.
| return false | ||
| } | ||
|
|
||
| flusher.Flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand correctly you serialize each chunk and flush it to the client. What happens to content-length header in this case? And how does the client know that there are multiple chunks to be received?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So http client by default will read it till EOF, no content length need to know by client as it is handled automatically.
| yield([]ptrace.Traces{trace2}, nil) | ||
| })).Once() | ||
|
|
||
| r, err := http.NewRequest(http.MethodGet, "/api/v3/traces/1", http.NoBody) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please elaborate what and how you're testing here. All asserts seem to happen against the recorder w, but what about the client side? How is the client supposed to handle chunked response? If the only thing you're doing is just avoiding writing a single huge payload from the server to a client connection, then yes it protects the server against keeping too much state in memory, but it doesn't really help the client to precess the results in a streaming fashion, it still needs to read the whole cumulative payload.
Also, what happens to the adjusters in the query service? I suspect we still have to load the complete trace into memory to adjust it. That doesn't mean there's no benefit to streaming - we can at least chunk up the stream on individual traces rather than on ALL traces in the response coming as a single payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Client is still getting chunk by chunk bcoz we are streaming and adjusters also doing per trace in the iterator,
that is why queryService v2 uses iter.Seq2 which is also designed for streaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if even chunk is fully formed JSON then the combined result will not be valid JSON. The tests need to include client side to showcase how it is meant to handle the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hii @yurishkuro , I made the relevant changes in the test file that you mentioned about, please have a look on it!
Thanks!
|
Hii @yurishkuro , |
Signed-off-by: Somil Jain <somiljain896@gmail.com>
Signed-off-by: Somil Jain <somiljain896@gmail.com>
Signed-off-by: Somil Jain <somiljain896@gmail.com>
97748d2 to
6bcf029
Compare
Signed-off-by: Somil Jain <somiljain896@gmail.com>
|
Hii @yurishkuro , I had resolved the comments. please review this. |
|
Hii @yurishkuro , Addressed both comments, Kindly review those changes. |
Signed-off-by: Somil Jain <somiljain896@gmail.com>
3f0dd6b to
a37de9d
Compare
| ] | ||
| } | ||
| ] | ||
| [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you cannot change how the existing API works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, Even I was concerned about this, but what could be alternate way to do this I can't think of. Can you guide me please? @yurishkuro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is overall the right approach. All you're doing is using a lower-level TCP stream and allowing the server to write a single large response in smaller chunks. Some very advanced client with online JSON parser might be able to use it, but for most clients there's no benefit, they still have to load all chunks as a single payload before parsing it.
You are not using chunked encoding feature of HTTP protocol that at least informs the client of the logical message boundaries. Nor are you following the grpc-web protocol in how it supports the streaming APIs.
@yurishkuro Thank you for the feedback. I understand the current approach doesn't provide real streaming benefits. Could you help me understand the best path forward?
Or something else ? what do you suggest? |
|
I would investigate grpc-web approach first. This /api/v3 was originally built with grpc-web framework, but later we removed that dependency and reimplemented manually. It doesn't mean we can't still support the same capabilities. In particular, in |
Signed-off-by: Somil Jain <somiljain896@gmail.com>
9a91a07 to
b6a6b0e
Compare
|
Hii @yurishkuro , |
| cp "./cmd/query/query-${PLATFORM}" "${PACKAGE_STAGING_DIR}/jaeger-query${FILE_EXTENSION}" | ||
| cp "./cmd/collector/collector-${PLATFORM}" "${PACKAGE_STAGING_DIR}/jaeger-collector${FILE_EXTENSION}" | ||
| cp "./cmd/ingester/ingester-${PLATFORM}" "${PACKAGE_STAGING_DIR}/jaeger-ingester${FILE_EXTENSION}" | ||
| cp "./examples/hotrod/hotrod-${PLATFORM}" "${PACKAGE_STAGING_DIR}/example-hotrod${FILE_EXTENSION}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR
| // We should have multiple trace objects | ||
| assert.GreaterOrEqual(t, nonEmptyLines, 1, "Should have at least 1 trace") | ||
| assert.Contains(t, body, "foobar") // First trace span name | ||
| assert.Contains(t, body, "second-span") // Second trace span name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test needs to demonstrate how a real client would consume the data. They are not going to search for substrings, they are going to parse each chunk.
cmd/query/app/apiv3/http_gateway.go
Outdated
| } | ||
|
|
||
| w.Header().Set("Content-Type", "application/json") | ||
| w.Header().Set("X-Content-Type-Options", "nosniff") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? add comment
cmd/query/app/apiv3/http_gateway.go
Outdated
|
|
||
| w.Header().Set("Content-Type", "application/json") | ||
| w.Header().Set("X-Content-Type-Options", "nosniff") | ||
| w.Header().Set("Transfer-Encoding", "chunked") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not going to be backwards compatible. We need to introduce a feature flag that would enable returning chunked encoding instead of single payload. Or make it a request parameter (which is worse since the API is defined by gRPC service, which doesn't need such parameter). So I would go with a feature gate.
… tests to test real client Signed-off-by: Somil Jain <somiljain896@gmail.com>
|
Heyy @yurishkuro , |
Which problem is this PR solving?
#6467
Description of the changes
HTTP Response Format Change (Breaking Change):
The
/api/v3/tracesHTTP endpoints now return a JSON array to support true streaming while maintaining valid JSON format:Before:
{ "result": { "resourceSpans": [...] } }After:
[ { "result": { "resourceSpans": [...] } } ]Implementation Details :
[→ trace →,→ trace →]How was this change tested?
Tested locally testing and added testcases as well
Checklist
jaeger:make lint testjaeger-ui:npm run lintandnpm run test