docs: update blob size limit error troubleshooting page#4379
docs: update blob size limit error troubleshooting page#4379
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📖 Docs PR preview links
|
| There are multiple strategies you can use to avoid this error: | ||
|
|
||
| 1. Use compression with a [custom payload codec](/payload-codec) for large payloads. | ||
| 1. (Recommended) Use [External Storage](/external-storage) to offload large payloads to an object store like S3. |
There was a problem hiding this comment.
"Recommended" is a bit strong given that it's in pre-release. But if you add a pre-release note, that could address gaps in terms of language support and stability
There was a problem hiding this comment.
I'll keep this as the first item but remove (Recommended) for now. We can add it back either during Public Preview or GA
| title: Troubleshoot the blob size limit error | ||
| sidebar_label: Blob size limit error | ||
| description: The BlobSizeLimitError occurs when a Workflow's payload exceeds the 2 MB request limit or the 4 MB Event History transaction limit set by Temporal. Reduce blob size via compression or batching. | ||
| description: |
There was a problem hiding this comment.
Hmm, I know you aren't adding this, so no need to act now, but I wonder if anything about this is still accurate. If it's referring to server side errors, I think we throw GrpcMessageTooLarge in the latter case (but we can check w/ eng to be sure.) and BadAttributes in the former case. At least that's what I discussed with @simvlad and @jmaeagle99 but maybe I'm confused about what this means.
Update: according to https://github.com/search?q=org%3Atemporalio%20BlobSizeLimitError&type=code, I think "BlobSizeLimitError" is the actual size limit rather than the error that's thrown...
There was a problem hiding this comment.
In case if it exceeds 2MB, we terminated the workflow with one of bad** errors, like the following:
BadScheduleActivityAttributes: ScheduleActivityTaskCommandAttributes.Input exceeds size limit.
but this is specific to the large payloads in the activity input. For example, we also throw WORKFLOW_TASK_FAILED_CAUSE_BAD_UPDATE_WORKFLOW_EXECUTION_MESSAGE.
As for the GrpcMessageTooLarge, it would be on the client and look like this:
rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5767664 vs. 4194304
| - [Python: Large payload storage](/develop/python/data-handling/large-payload-storage) | ||
|
|
||
| 2. Break larger batches of commands into smaller batch sizes: | ||
| 2. Use compression with a [custom Payload Codec](/payload-codec) for large payloads. This addresses the immediate issue, |
There was a problem hiding this comment.
| 2. Use compression with a [custom Payload Codec](/payload-codec) for large payloads. This addresses the immediate issue, | |
| 2. Use compression with a [custom Payload Codec](/payload-codec) for large payloads. But even if this addresses the immediate issue, |
| 1. (Recommended) Use [External Storage](/external-storage) to offload large payloads to an object store like S3. | ||
| Currently available in the [Python SDK](/develop/python/data-handling/large-payload-storage). When a payload exceeds a size threshold, | ||
| a storage driver uploads it to your external store and replaces it with a small reference token in the Event History. | ||
| Your Workflow and Activity code doesn't need to change. Even if your payloads are within the limit today, consider |
There was a problem hiding this comment.
nit: "Even if your payloads are within the limit today" -- the user probably wouldn't be in this doc if that were the case. =) But I like the advice in general.
There was a problem hiding this comment.
I was thinking they may have multiple Workflows. They ran into this error on one of them, but this could nudge them to refactor others even if those aren't reporting errors now.
| implementing External Storage if their size could grow over time. | ||
|
|
||
| - This addresses the immediate issue of the blob size limit; however, if blob sizes continue to grow this problem can arise again. | ||
| For SDK-specific guides, see: |
There was a problem hiding this comment.
I think it's too early to wipe the old advice. (a) cos pre-release/limited language support, (b) since we don't provide the ability to create payload handles yet, the advice to "Pass references to the stored payloads within the Workflow instead of the actual data." is still valid no matter what.
Maybe we just list these as alternatives?
| 2. Retrieve the payloads from the object store when needed during execution. | ||
| 2. Introduce brief pauses or sleeps between batches. | ||
|
|
||
| ## Workflow termination due to oversized response |
There was a problem hiding this comment.
Actually, I tried it again. In the case workflow task response exceeds 4MB gRPC limit, the workflow would keep retrying with:
{
"cause": "WORKFLOW_TASK_FAILED_CAUSE_GRPC_MESSAGE_TOO_LARGE",
"failure": {
"message": "rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5243837 vs. 4194304)",
"applicationFailureInfo": {
"type": "GrpcMessageTooLargeError"
}
}
}
So this is not correct
Keep only substantive content changes, remove line-wrapping noise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Split into two sections: payload size limit and gRPC message size limit - Fix incorrect claim that gRPC oversized response terminates the Workflow - Add error message examples for both limit types - Add External Storage and claim check pattern as resolution - Clarify gRPC limit applies to Client-Service and Worker-Service communication - Note payload size limit is configurable on self-hosted Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| See the [gRPC Message Too Large error reference](/references/errors#grpc-message-too-large) for more details. | ||
| ### Error messages | ||
|
|
||
| - `WORKFLOW_TASK_FAILED_CAUSE_GRPC_MESSAGE_TOO_LARGE`: When a Workflow Worker completes a Workflow Task, it sends all the commands the Workflow produced (such as Activity schedules and their inputs) back to the Temporal Service. If that response exceeds 4 MB, the SDK catches the gRPC error and sends a failed Workflow Task response with this cause. Because replay produces the same oversized response, the Workflow gets stuck in a retry loop that isn't visible in the Event History. |
There was a problem hiding this comment.
There is a new error message that is analogous to this one called WORKFLOW_TASK_FAILED_CAUSE_PAYLOADS_TOO_LARGE.
The SDK will get the payload limits from the server and intentionally fail workflow tasks if it contains payloads that are over the payload size limit. This enables the workflow to be retried instead of failing it so that other solutions (such as external storage) may be applied to alleviate the problem and allow the workflow to continue.
There was a problem hiding this comment.
Huh, I'm a little confused by this. What makes this error different from the other payload errors (in the above section)? Why does this allow retries while other payload errors terminate the workflow per Vlad's testing?
Or do you mean this is a new error message we are adding together with the release of external storage?
There was a problem hiding this comment.
It's a new error code. Prior to adding this code and the new behavior to the SDK, if the SDK submitted payloads that were greater than 2 MB but less than 4 MB (the gRPC limit), the server would fail the workflow (not just the workflow task, but the entire workflow). The SDK does the best effort size check on the worker and intentionally fails the workflow task instead of uploading the large payload that would fail the workflow.
Summary
Split from #4333 (PR 3 of 3).
Test plan
🤖 Generated with Claude Code
┆Attachments: EDU-6149 docs: update blob size limit error troubleshooting page