fix(tools): add vision note to planning file editor#2358
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
API breakage checks (Griffe)Result: Passed |
Agent server REST API breakage checks (OpenAPI)Result: Failed Log excerpt (first 1000 characters) |
all-hands-bot
left a comment
There was a problem hiding this comment.
Review Summary
The change correctly mirrors the file_editor vision logic, but there are two important issues to address:
- Missing tests - file_editor has tests for vision enabled/disabled, but this PR doesn't add them
- Code duplication - The vision description building logic is now duplicated between two tools
While the implementation follows the established pattern from file_editor, the test coverage gap should be addressed before merging.
| description_lines = FILE_EDITOR_TOOL_DESCRIPTION.split("\n") | ||
| base_description = "\n".join(description_lines[:2]) | ||
| remaining_description = "\n".join(description_lines[2:]) | ||
|
|
||
| if conv_state.agent.llm.vision_is_active(): | ||
| file_editor_description = ( | ||
| f"{base_description}\n" | ||
| "* If `path` is an image file (.png, .jpg, .jpeg, .gif, .webp, " | ||
| ".bmp), `view` displays the image content\n" | ||
| f"{remaining_description}" | ||
| ) | ||
| else: | ||
| file_editor_description = FILE_EDITOR_TOOL_DESCRIPTION | ||
|
|
||
| tool_description = f"{file_editor_description}\n\n{PLANNING_RESTRICTIONS}" |
There was a problem hiding this comment.
🟠 Important: Code Duplication
This vision description logic is now duplicated between file_editor/definition.py and planning_file_editor/definition.py. The same string splitting and conditional insertion happens in both places.
Problem: If the description format changes or the split point needs adjustment, both files must be updated. This violates DRY and increases maintenance burden.
Suggestion: Extract this into a shared helper function that both tools can use:
def build_tool_description_with_vision(
base_description: str,
vision_enabled: bool
) -> str:
"""Build tool description with optional vision note."""
if not vision_enabled:
return base_description
description_lines = base_description.split("\n")
base = "\n".join(description_lines[:2])
remaining = "\n".join(description_lines[2:])
return (
f"{base}\n"
"* If `path` is an image file (.png, .jpg, .jpeg, .gif, .webp, "
".bmp), `view` displays the image content\n"
f"{remaining}"
)Then both tools can call this shared function. This would centralize the fragile string manipulation logic and make it easier to maintain.
Note: The [:2] split is fragile (magic number), but that's a pre-existing issue from file_editor - not introduced by this PR. Still worth addressing in a follow-up.
| else: | ||
| file_editor_description = FILE_EDITOR_TOOL_DESCRIPTION | ||
|
|
||
| tool_description = f"{file_editor_description}\n\n{PLANNING_RESTRICTIONS}" |
There was a problem hiding this comment.
🟠 Important: Missing Test Coverage
The file_editor tool has tests for vision behavior:
test_file_editor_tool_image_viewing_line_with_vision_enabled()test_file_editor_tool_image_viewing_line_with_vision_disabled()
This PR adds the same vision logic to planning_file_editor but doesn't add corresponding tests. Since this is a bug fix (#2357), tests would verify:
- The vision note appears when vision is enabled
- The vision note is absent when vision is disabled
- The PLANNING_RESTRICTIONS are still properly included in both cases
Suggestion: Add tests similar to the file_editor ones in tests/tools/planning_file_editor/test_planning_file_editor_tool.py:
def test_planning_file_editor_image_viewing_line_with_vision_enabled():
"""Test that image viewing line is included when LLM supports vision."""
# Similar to file_editor test but verify PLANNING_RESTRICTIONS too
def test_planning_file_editor_image_viewing_line_with_vision_disabled():
"""Test that image viewing line is excluded when LLM doesn't support vision."""
# Similar to file_editor test but verify PLANNING_RESTRICTIONS too
Summary
Fixes #2357
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:b8f9c70-pythonRun
All tags pushed for this build
About Multi-Architecture Support
b8f9c70-python) is a multi-arch manifest supporting both amd64 and arm64b8f9c70-python-amd64) are also available if needed