Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions .pr/review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# PR #2356 Review: execute_tool API endpoint

## Summary

This PR adds a `POST /api/conversations/{id}/execute_tool` endpoint that allows
executing tools (like the terminal) directly on a conversation without going
through the agent loop. This is primarily designed for `setup.sh` execution where
environment changes need to persist in the agent's terminal session.

## CI Fixes Applied

### 1. Pyright type errors in `event_service.py` (pre-commit)
**Problem**: `self._conversation` is typed `LocalConversation | None`. Pyright
cannot narrow the type inside a nested closure after the None-check.

**Fix**: Captured `self._conversation` in a local variable `conversation` before
defining the `_execute()` closure.

### 2. `Observation` ABC instantiation in `remote_conversation.py` (sdk-tests)
**Problem**: `Observation` is an abstract base class (ABC) and cannot be
instantiated directly. `BaseObservation.from_text()` raised a validation error.

**Fix**: Created a private concrete `_RemoteObservation` subclass inside
`execute_tool()` to hold the response text.

### 3. Stale test `test_remote_conversation_execute_tool_not_implemented`
**Problem**: The test expected `NotImplementedError` but the PR now implements
the method.

**Fix**: Replaced with `test_remote_conversation_execute_tool` that mocks the
API response and verifies the observation is correctly parsed.

### 4. Line-too-long issues (E501)
Fixed in `event_service.py` docstring and `models.py` field description.

## Functional Test Results

Started the agent server and tested the endpoint with:

### ✅ Successful tool execution
```
POST /api/conversations/{id}/execute_tool
{"tool_name": "terminal", "action": {"command": "echo hello world && pwd"}}

Response: {"observation": {"content": [{"text": "hello world\n/tmp/test_workspace"}], ...}, "is_error": false}
```

### ✅ Environment variable persistence (core use case)
```
# Set variable
{"tool_name": "terminal", "action": {"command": "export MY_SETUP_VAR=setup_complete"}}
# → success

# Read variable in subsequent call
{"tool_name": "terminal", "action": {"command": "echo $MY_SETUP_VAR"}}
# → "setup_complete" ← Variable persists across calls!
```

### ✅ Error handling: nonexistent tool
```
{"tool_name": "nonexistent_tool", "action": {"command": "echo test"}}
# → 400: "Tool 'nonexistent_tool' not found. Available tools: ['terminal', 'finish', 'think']"
```

### ✅ Error handling: nonexistent conversation
```
POST /api/conversations/00000000-0000-0000-0000-000000000000/execute_tool
# → 404: "Not Found"
```

## Code Review Notes

### Architecture
The layered approach is clean and consistent with existing patterns:
- **Router** → **ConversationService** → **EventService** → **LocalConversation**
- `_ensure_agent_ready()` handles lazy initialization of tools

### Minor observation
In `conversation_router.py`, the `except KeyError` followed by `except Exception`
is technically redundant (Exception already catches KeyError), but it serves as
documentation of expected error types and is harmless.

### Overall Assessment
The PR is well-structured and solves a real problem. The key insight —
executing through the agent's persistent terminal session rather than ephemeral
subprocesses — correctly addresses the `setup.sh` environment persistence issue.
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
ConversationInfo,
ConversationPage,
ConversationSortOrder,
ExecuteToolRequest,
ExecuteToolResponse,
GenerateTitleRequest,
GenerateTitleResponse,
SendMessageRequest,
Expand Down Expand Up @@ -326,3 +328,37 @@ async def condense_conversation(
if not success:
raise HTTPException(status.HTTP_404_NOT_FOUND, detail="Conversation not found")
return Success()


@conversation_router.post(
"/{conversation_id}/execute_tool",
responses={
404: {"description": "Item not found"},
400: {"description": "Tool not found or execution error"},
},
)
async def execute_tool(
conversation_id: UUID,
request: ExecuteToolRequest,
conversation_service: ConversationService = Depends(get_conversation_service),
) -> ExecuteToolResponse:
"""Execute a tool directly on a conversation without going through the agent loop.

This is useful for pre-run setup operations like running setup scripts
through the agent's terminal tool so environment changes persist in the
agent's session.
"""
try:
result = await conversation_service.execute_tool(
conversation_id, request.tool_name, request.action
)
except KeyError as e:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
except Exception as e:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
if result is None:
raise HTTPException(status.HTTP_404_NOT_FOUND)
return ExecuteToolResponse(
observation=result,
is_error=result.get("is_error", False),
)
Original file line number Diff line number Diff line change
Expand Up @@ -445,6 +445,20 @@ async def ask_agent(self, conversation_id: UUID, question: str) -> str | None:
response = await event_service.ask_agent(question)
return response

async def execute_tool(
self, conversation_id: UUID, tool_name: str, action_dict: dict
) -> dict | None:
"""Execute a tool directly on a conversation.

Returns the observation dict, or None if the conversation was not found.
"""
if self._event_services is None:
raise ValueError("inactive_service")
event_service = self._event_services.get(conversation_id)
if event_service is None:
return None
return await event_service.execute_tool(tool_name, action_dict)

async def condense(self, conversation_id: UUID) -> bool:
"""Force condensation of the conversation history."""
if self._event_services is None:
Expand Down
40 changes: 40 additions & 0 deletions openhands-agent-server/openhands/agent_server/event_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,46 @@ async def _run_with_error_handling():
# blocking on the full conversation execution.
loop.create_task(_run_with_error_handling())

async def execute_tool(self, tool_name: str, action_dict: dict) -> dict:
"""Execute a tool directly on the conversation.

Bypasses the agent loop. Useful for pre-run setup operations
like running .openhands/setup.sh through the agent's terminal
tool so environment changes persist in the agent's session.

Args:
tool_name: The name of the tool to execute (e.g., 'terminal')
action_dict: The action parameters as a dictionary

Returns:
The observation as a dictionary

Raises:
ValueError: If the service is inactive
KeyError: If the tool is not found
"""
if not self._conversation:
raise ValueError("inactive_service")

conversation = self._conversation
loop = asyncio.get_running_loop()

def _execute():
# Get the tool to resolve the action type
conversation._ensure_agent_ready()
tool = conversation.agent.tools_map.get(tool_name)
if tool is None:
available_tools = list(conversation.agent.tools_map.keys())
raise KeyError(
f"Tool '{tool_name}' not found. Available tools: {available_tools}"
)
# Validate the action dict against the tool's action type
action = tool.action_type.model_validate(action_dict)
observation = conversation.execute_tool(tool_name, action)
return observation.model_dump(mode="json")

return await loop.run_in_executor(None, _execute)

async def subscribe_to_events(self, subscriber: Subscriber[Event]) -> UUID:
subscriber_id = self._pub_sub.subscribe(subscriber)

Expand Down
22 changes: 22 additions & 0 deletions openhands-agent-server/openhands/agent_server/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,28 @@ class AskAgentResponse(BaseModel):
response: str = Field(description="The agent's response to the question")


class ExecuteToolRequest(BaseModel):
"""Payload to execute a tool directly on a conversation."""

tool_name: str = Field(
description="The name of the tool to execute (e.g., 'terminal')"
)
action: dict[str, Any] = Field(
description="The action parameters to pass to the tool executor"
)


class ExecuteToolResponse(BaseModel):
"""Response from executing a tool."""

observation: dict[str, Any] = Field(
description="The observation returned by the tool execution"
)
is_error: bool = Field(
default=False, description="Whether the tool execution resulted in an error"
)


class BashEventBase(DiscriminatedUnionMixin, ABC):
"""Base class for all bash event types"""

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1210,24 +1210,50 @@ def condense(self) -> None:
def execute_tool(self, tool_name: str, action: "Action") -> "Observation":
"""Execute a tool directly without going through the agent loop.

Note: This method is not yet supported for RemoteConversation.
Tool execution for remote conversations happens on the server side
during the normal agent loop.
This sends the tool execution request to the agent server, which
executes it through the conversation's tool executor. This is useful
for pre-run setup operations like running setup scripts through the
agent's terminal tool so environment changes persist in the agent's
session.

Args:
tool_name: The name of the tool to execute
action: The action to pass to the tool executor

Returns:
The observation returned by the tool execution

Raises:
NotImplementedError: Always, as this feature is not yet supported
for remote conversations.
RuntimeError: If the tool execution fails on the server
"""
raise NotImplementedError(
"execute_tool is not yet supported for RemoteConversation. "
"Tool execution for remote conversations happens on the server side "
"during the normal agent loop. Use LocalConversation for direct "
"tool execution."
from openhands.sdk.tool.schema import Observation

payload = {
"tool_name": tool_name,
"action": action.model_dump(mode="json"),
}
response = _send_request(
self._client,
"POST",
f"/api/conversations/{self._id}/execute_tool",
json=payload,
)
result = response.json()
observation_data = result.get("observation", {})
is_error = result.get("is_error", False)

# Extract text from the observation data
text = observation_data.get("text", "")
if not text:
content = observation_data.get("content", [])
if content and isinstance(content, list):
text = content[0].get("text", "") if content else ""

# Use a concrete subclass since Observation is abstract
class _RemoteObservation(Observation):
pass

return _RemoteObservation.from_text(text=text, is_error=is_error)

def close(self) -> None:
"""Close the conversation and clean up resources.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,41 @@ async def git_diff(self, path: str | Path) -> GitDiff:
result = await self._execute(generator)
return result

async def execute_tool(
self,
conversation_id: str,
tool_name: str,
action: dict[str, Any],
timeout: float = 600.0,
) -> dict[str, Any]:
"""Execute a tool on a conversation through the agent server.

This calls the agent server's execute_tool endpoint, which runs the tool
through the conversation's tool executor (e.g., the persistent terminal
session). This is useful for pre-run setup operations like running
.openhands/setup.sh where environment changes need to persist.

Args:
conversation_id: The conversation ID (hex string)
tool_name: The name of the tool to execute (e.g., 'terminal')
action: The action parameters as a dictionary
timeout: Timeout in seconds

Returns:
The response dict with 'observation' and 'is_error' keys
"""
payload = {
"tool_name": tool_name,
"action": action,
}
response = await self.client.post(
f"/api/conversations/{conversation_id}/execute_tool",
json=payload,
timeout=timeout,
)
response.raise_for_status()
return response.json()

@property
def alive(self) -> bool:
"""Check if the remote workspace is alive by querying the health endpoint.
Expand Down
44 changes: 29 additions & 15 deletions tests/sdk/conversation/remote/test_remote_conversation.py
Original file line number Diff line number Diff line change
Expand Up @@ -1004,31 +1004,45 @@ def test_remote_conversation_host_url_normalization(self, mock_ws_client):
@patch(
"openhands.sdk.conversation.impl.remote_conversation.WebSocketCallbackClient"
)
def test_remote_conversation_execute_tool_not_implemented(self, mock_ws_client):
"""Test that execute_tool raises NotImplementedError for RemoteConversation."""
def test_remote_conversation_execute_tool(self, mock_ws_client):
"""Test that execute_tool calls the server endpoint."""
# Setup mocks
mock_client_instance = self.setup_mock_client()

conversation_id = str(uuid.uuid4())
mock_conv_response = self.create_mock_conversation_response(conversation_id)
mock_events_response = self.create_mock_events_response()

mock_client_instance.post.return_value = mock_conv_response
mock_client_instance.get.return_value = mock_events_response

mock_ws_instance = Mock()
mock_ws_client.return_value = mock_ws_instance

# Create conversation
conversation = RemoteConversation(agent=self.agent, workspace=self.workspace)

# Create a dummy action (using a simple mock)
# Mock the execute_tool API response
mock_execute_response = Mock()
mock_execute_response.status_code = 200
mock_execute_response.raise_for_status.return_value = None
mock_execute_response.json.return_value = {
"observation": {
"content": [{"text": "hello world", "kind": "text"}],
"is_error": False,
},
"is_error": False,
}

# Override request for execute_tool endpoint
original_side_effect = mock_client_instance.request.side_effect

def request_side_effect(method, url, **kwargs):
if method == "POST" and "execute_tool" in url:
return mock_execute_response
return original_side_effect(method, url, **kwargs)

mock_client_instance.request.side_effect = request_side_effect

# Create a mock action with model_dump
from unittest.mock import MagicMock

mock_action = MagicMock()
mock_action.model_dump.return_value = {"command": "echo hello"}

# Verify execute_tool raises NotImplementedError
with pytest.raises(NotImplementedError) as exc_info:
conversation.execute_tool("any_tool", mock_action)

assert "not yet supported for RemoteConversation" in str(exc_info.value)
result = conversation.execute_tool("terminal", mock_action)
assert result.text == "hello world"
assert result.is_error is False
Loading