Skip to content

Long-Running Tool FunctionResponse Routing Bypasses Custom Agent Orchestration #3986

@selimseker

Description

@selimseker

Summary

When a sub-agent within a custom orchestrator agent calls a LongRunningFunctionTool, the FunctionResponse is routed directly to the sub-agent, bypassing the parent custom agent's _run_async_impl method entirely. This prevents custom agents from implementing proper workflow orchestration and continuation logic after long-running tool pauses.

Our Architecture

We have a custom orchestrator agent (DomainAgent) that manages multi-step workflows:

DomainAgent (custom orchestrator)
├── Router Agent (LlmAgent)
├── Planner Agent (LlmAgent)
└── Step Executor Agent (LlmAgent) ← calls long-running tool

Flow:

  1. DomainAgent._run_async_impl() orchestrates the workflow
  2. Calls router → planner → executes multiple steps sequentially
  3. Step Executor agent calls ask_clarifications (LongRunningFunctionTool)
  4. Frontend collects user input and sends FunctionResponse
  5. Problem: Runner routes directly to Step Executor, never re-enters DomainAgent._run_async_impl()

The Issue

In runners.py, the _find_agent_to_run() method (line 1003-1055) determines which agent should handle the next message:

def _find_agent_to_run(self, session: Session, root_agent: BaseAgent) -> BaseAgent:
    # If the last event is a function response, send to agent that made the function call
    event = find_matching_function_call(session.events)
    if event and event.author:
      return root_agent.find_agent(event.author)  # ← Returns sub-agent directly
    # ... rest of logic

What happens:

  1. Step Executor calls long-running tool → event.author = "step_executor"
  2. Frontend sends FunctionResponse via run_async(invocation_id=..., new_message=FunctionResponse)
  3. _find_agent_to_run() finds the function call event with author="step_executor"
  4. Returns Step Executor agent directly
  5. Runner calls step_executor.run_async()
  6. DomainAgent's orchestration logic is completely bypassed

Expected Behavior

The DomainAgent should be able to:

  1. Detect that a FunctionResponse is being processed
  2. Update workflow state
  3. Continue orchestration (e.g., move to next workflow step)
  4. Properly manage the entire workflow lifecycle

Questions

  1. Is this the intended behavior? Should long-running tools always route directly to the calling agent?
  2. Is there a supported way for custom orchestrator agents to intercept FunctionResponses for sub-agents?
  3. Could ADK provide a hook (e.g., can_handle_sub_agent_function_response()) to let parent agents opt-in to handling their sub-agents' function responses?

Environment

  • ADK Version: [Latest from main]
  • Python Version: 3.12
  • Agent Type: Custom BaseAgent subclass with LlmAgent sub-agents

Additional Context

This limitation makes it difficult to implement:

  • Multi-step workflows with human-in-the-loop
  • Orchestrator agents that manage sub-agent lifecycles
  • Complex state machines where parent agents coordinate sub-agent interactions
  • Resume logic that requires parent-level context

Metadata

Metadata

Labels

core[Component] This issue is related to the core interface and implementationneeds-review[Status] The PR is awaiting review from the maintainer

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions