-
Notifications
You must be signed in to change notification settings - Fork 167
Description
Summary
When a server crashes during tool execution and restarts, crash recovery emits an AgentErrorEvent. However, if run() is called again, the action is re-executed, resulting in both an AgentErrorEvent and an ObservationEvent for the same tool_call_id.
Root Cause
get_unmatched_actions() in state.py:454 only checks for ObservationEvent and UserRejectObservation:
if isinstance(event, (ObservationEvent, UserRejectObservation)):
observed_action_ids.add(event.action_id)It does NOT check for AgentErrorEvent, so the action remains "unmatched" even after crash recovery emits an error for it.
Complete Control Flow
ActionEventcreated (tool_call_id=X)- Server crashes during tool execution
- On restart,
event_service.start()detectsexecution_status==RUNNING - Crash recovery emits
AgentErrorEvent(tool_call_id=X) -event_service.py:479-488 - Crash recovery sets
execution_status=ERROR - User calls
run()again run()allows ERROR status to proceed -local_conversation.py:549-554:if self._state.execution_status in [IDLE, PAUSED, ERROR]: self._state.execution_status = RUNNING
agent.step()callsget_unmatched_actions()which returns the action (becauseAgentErrorEventis not checked)agent.step()calls_execute_actions()on the "pending" action -agent.py:264-271- Tool executes and emits
ObservationEvent(tool_call_id=X) - Result: BOTH
AgentErrorEventANDObservationEventfor sametool_call_id
Potential Fixes
-
Add
AgentErrorEventtoget_unmatched_actions()- ButAgentErrorEventdoes not haveaction_id, onlytool_call_id. Would need to match bytool_call_idinstead, or addaction_idtoAgentErrorEvent. -
Change crash recovery to NOT allow re-execution - Either use a different status that blocks
run(), or do not emitAgentErrorEventat all. -
Make
get_unmatched_actions()also checkAgentErrorEventbytool_call_id- This would be a behavior change but might be the cleanest fix.
Related Code
openhands-agent-server/openhands/agent_server/event_service.py:470-488- Crash recoveryopenhands-sdk/openhands/sdk/conversation/state.py:450-462-get_unmatched_actions()openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py:549-554- ERROR status handlingopenhands-sdk/openhands/sdk/agent/agent.py:264-271- Pending action execution