How to handling large tool response? #33156

Lin-jun-xiang · 2025-09-30T07:24:50Z

Lin-jun-xiang
Sep 30, 2025

I'm currently developing an agent where the tool response can sometimes be extremely large (tens of thousands of tokens).

Right now, I always add it directly to the conversation. However, this makes the next round of dialogue very slow (by feeding a massive number of tokens to the LLM). That said, it's still better than not storing the tool response as part of the history. What suggestions do you have for how to store and use these long-context tool responses?

I tried using the ReAct agent to handle problems directly, adding only the agent responses to the history without including the tool responses. This runs normally and quickly. However, the performance in multi-turn conversations isn't intelligent enough. So, I switched to also adding the tool responses as ToolMessages to the historical conversation. While this makes the agent a bit smarter, it results in extremely long response delays and massive costs.

Additionally, I've tried summarizing and compressing oversized tool responses first via an LLM, but this makes the compression process take a very long time, significantly increasing the overall delay.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to handling large tool response? #33156

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

How to handling large tool response? #33156

Uh oh!

Lin-jun-xiang Sep 30, 2025

Replies: 0 comments

Lin-jun-xiang
Sep 30, 2025