Skip to content

fix(tools): add vision note to planning file editor#2358

Draft
enyst wants to merge 1 commit intomainfrom
fix/planning-file-editor-vision-note
Draft

fix(tools): add vision note to planning file editor#2358
enyst wants to merge 1 commit intomainfrom
fix/planning-file-editor-vision-note

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Mar 8, 2026

Summary

  • include the vision image-view bullet in planning_file_editor description when vision is enabled
  • build description dynamically to mirror file_editor behavior

Fixes #2357


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:b8f9c70-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-b8f9c70-python \
  ghcr.io/openhands/agent-server:b8f9c70-python

All tags pushed for this build

ghcr.io/openhands/agent-server:b8f9c70-golang-amd64
ghcr.io/openhands/agent-server:b8f9c70-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:b8f9c70-golang-arm64
ghcr.io/openhands/agent-server:b8f9c70-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:b8f9c70-java-amd64
ghcr.io/openhands/agent-server:b8f9c70-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:b8f9c70-java-arm64
ghcr.io/openhands/agent-server:b8f9c70-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:b8f9c70-python-amd64
ghcr.io/openhands/agent-server:b8f9c70-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:b8f9c70-python-arm64
ghcr.io/openhands/agent-server:b8f9c70-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:b8f9c70-golang
ghcr.io/openhands/agent-server:b8f9c70-java
ghcr.io/openhands/agent-server:b8f9c70-python

About Multi-Architecture Support

  • Each variant tag (e.g., b8f9c70-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., b8f9c70-python-amd64) are also available if needed

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2026

API breakage checks (Griffe)

Result: Passed

Action log

@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Failed

Log excerpt (first 1000 characters)
{"asctime": "2026-03-08 06:49:30,190", "levelname": "WARNING", "name": "openhands.agent_server.config", "filename": "config.py", "lineno": 173, "message": "\u26a0\ufe0f OH_SECRET_KEY was not defined. Secrets will not be persisted between restarts."}
::error title=openhands-agent-server REST API::Breaking REST API change detected without MINOR version bump (1.12.0 -> 1.12.0).

Breaking REST API changes detected compared to baseline release:
- the 'file' request property type/format changed from 'string'/'' to 'string'/'binary'
/home/runner/work/software-agent-sdk/software-agent-sdk/.venv/lib/python3.13/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:66: DeprecationWarning: There is no current event loop
  loop = asyncio.get_event_loop()

Action log

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

The change correctly mirrors the file_editor vision logic, but there are two important issues to address:

  1. Missing tests - file_editor has tests for vision enabled/disabled, but this PR doesn't add them
  2. Code duplication - The vision description building logic is now duplicated between two tools

While the implementation follows the established pattern from file_editor, the test coverage gap should be addressed before merging.

Comment on lines +127 to +141
description_lines = FILE_EDITOR_TOOL_DESCRIPTION.split("\n")
base_description = "\n".join(description_lines[:2])
remaining_description = "\n".join(description_lines[2:])

if conv_state.agent.llm.vision_is_active():
file_editor_description = (
f"{base_description}\n"
"* If `path` is an image file (.png, .jpg, .jpeg, .gif, .webp, "
".bmp), `view` displays the image content\n"
f"{remaining_description}"
)
else:
file_editor_description = FILE_EDITOR_TOOL_DESCRIPTION

tool_description = f"{file_editor_description}\n\n{PLANNING_RESTRICTIONS}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Code Duplication

This vision description logic is now duplicated between file_editor/definition.py and planning_file_editor/definition.py. The same string splitting and conditional insertion happens in both places.

Problem: If the description format changes or the split point needs adjustment, both files must be updated. This violates DRY and increases maintenance burden.

Suggestion: Extract this into a shared helper function that both tools can use:

def build_tool_description_with_vision(
    base_description: str,
    vision_enabled: bool
) -> str:
    """Build tool description with optional vision note."""
    if not vision_enabled:
        return base_description
    
    description_lines = base_description.split("\n")
    base = "\n".join(description_lines[:2])
    remaining = "\n".join(description_lines[2:])
    
    return (
        f"{base}\n"
        "* If `path` is an image file (.png, .jpg, .jpeg, .gif, .webp, "
        ".bmp), `view` displays the image content\n"
        f"{remaining}"
    )

Then both tools can call this shared function. This would centralize the fragile string manipulation logic and make it easier to maintain.

Note: The [:2] split is fragile (magic number), but that's a pre-existing issue from file_editor - not introduced by this PR. Still worth addressing in a follow-up.

else:
file_editor_description = FILE_EDITOR_TOOL_DESCRIPTION

tool_description = f"{file_editor_description}\n\n{PLANNING_RESTRICTIONS}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Missing Test Coverage

The file_editor tool has tests for vision behavior:

  • test_file_editor_tool_image_viewing_line_with_vision_enabled()
  • test_file_editor_tool_image_viewing_line_with_vision_disabled()

This PR adds the same vision logic to planning_file_editor but doesn't add corresponding tests. Since this is a bug fix (#2357), tests would verify:

  1. The vision note appears when vision is enabled
  2. The vision note is absent when vision is disabled
  3. The PLANNING_RESTRICTIONS are still properly included in both cases

Suggestion: Add tests similar to the file_editor ones in tests/tools/planning_file_editor/test_planning_file_editor_tool.py:

def test_planning_file_editor_image_viewing_line_with_vision_enabled():
    """Test that image viewing line is included when LLM supports vision."""
    # Similar to file_editor test but verify PLANNING_RESTRICTIONS too
    
def test_planning_file_editor_image_viewing_line_with_vision_disabled():
    """Test that image viewing line is excluded when LLM doesn't support vision."""
    # Similar to file_editor test but verify PLANNING_RESTRICTIONS too

@enyst enyst marked this pull request as draft March 8, 2026 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Planning file editor tool description missing vision image-view note

3 participants