Skip to content

fix(terminal): filter terminal query sequences from captured output#2334

Open
jpshackelford wants to merge 4 commits intomainfrom
fix/terminal-escape-filter-minimal
Open

fix(terminal): filter terminal query sequences from captured output#2334
jpshackelford wants to merge 4 commits intomainfrom
fix/terminal-escape-filter-minimal

Conversation

@jpshackelford
Copy link
Contributor

@jpshackelford jpshackelford commented Mar 6, 2026

Summary

Summary

Fixes #2244 - Filter terminal query sequences from captured PTY output to prevent visible escape code garbage when displayed.

Problem

When CLI tools like gh, npm, or other progress-indicator tools run inside the SDK's PTY, they send terminal query sequences as part of their spinner/progress UI:

  • \x1b[6n - DSR (Device Status Report) - cursor position query
  • \x1b]11;? - OSC 11 - background color query
  • \x1b[c - DA (Device Attributes) query

These queries get captured as part of the command output. When the output is later displayed to the user's terminal, the terminal processes these queries and responds, causing visible garbage like:

^[[38;1R^[]11;rgb:30fb/3708/41af^G

How to Reproduce

  1. Run any SDK example that executes terminal commands
  2. Execute a command that uses progress indicators: gh pr list --repo OpenHands/openhands
  3. Observe escape code garbage appearing in the output or corrupting the shell prompt after exit

Visual Example

Before fix:

$ gh pr list
Fetching PRs...^[[6n^[]11;?
#123  Fix bug    main
^[[38;1R

After fix:

$ gh pr list
Fetching PRs...
#123  Fix bug    main

Root Cause Analysis

The escape codes are IN the captured PTY output stream, not generated by terminal responses to the SDK's own queries. When gh (or similar tools) runs:

  1. gh sends \x1b[6n to query cursor position (for spinner positioning)
  2. This query is written to the PTY's stdout
  3. The SDK captures all PTY output, including the query
  4. When displayed, the user's terminal sees the query and responds
  5. The response appears as visible garbage

Solution

Add filter_terminal_queries() to strip terminal query sequences from captured output before returning from the terminal tool. This removes the queries at the source, so the user's terminal never sees them.

Filtered sequences:

  • DSR (\x1b[6n) - cursor position query
  • OSC 10/11/4 (\x1b]10;?, \x1b]11;?, \x1b]4;?) - color queries
  • DA/DA2 (\x1b[c, \x1b[>c) - device attributes
  • DECRQSS (\x1bP$q...\x1b\\) - terminal state queries

Preserved sequences:

  • ANSI colors (\x1b[31m, \x1b[0m, etc.)
  • Cursor movement (\x1b[H, \x1b[5A, etc.)
  • Text formatting (bold, underline, etc.)

Testing

# Run the unit tests
uv run pytest tests/tools/terminal/test_escape_filter.py -v

# Manual test with gh command
uv run python -c "
from openhands.tools.terminal.utils.escape_filter import filter_terminal_queries

# Simulated output with embedded queries
output = '\x1b[32mSuccess\x1b[0m\x1b[6n some text \x1b]11;?\x07'
filtered = filter_terminal_queries(output)
print(repr(filtered))
# Output: '\x1b[32mSuccess\x1b[0m some text '
"

Files Changed

File Change
openhands-tools/.../utils/escape_filter.py NEW - Filter implementation (83 lines)
openhands-tools/.../utils/__init__.py NEW - Export function
openhands-tools/.../terminal_session.py Apply filter in _get_command_output()
tests/tools/terminal/test_escape_filter.py NEW - 15 unit tests

Design Decisions

  1. Filter at source: Apply filter where output is captured (terminal tool), not where it's displayed. This is simpler and more reliable.

  2. Byte-level regex: Use compiled regex patterns on bytes for accurate escape sequence matching.

  3. Preserve formatting: Only remove query sequences that trigger responses; keep colors and cursor movement intact.

  4. Minimal scope: This fix targets only the terminal tool output processing - no SDK-level changes needed.

Fixes: #2244


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:39dc01c-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-39dc01c-python \
  ghcr.io/openhands/agent-server:39dc01c-python

All tags pushed for this build

ghcr.io/openhands/agent-server:39dc01c-golang-amd64
ghcr.io/openhands/agent-server:39dc01c-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:39dc01c-golang-arm64
ghcr.io/openhands/agent-server:39dc01c-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:39dc01c-java-amd64
ghcr.io/openhands/agent-server:39dc01c-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:39dc01c-java-arm64
ghcr.io/openhands/agent-server:39dc01c-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:39dc01c-python-amd64
ghcr.io/openhands/agent-server:39dc01c-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:39dc01c-python-arm64
ghcr.io/openhands/agent-server:39dc01c-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:39dc01c-golang
ghcr.io/openhands/agent-server:39dc01c-java
ghcr.io/openhands/agent-server:39dc01c-python

About Multi-Architecture Support

  • Each variant tag (e.g., 39dc01c-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 39dc01c-python-amd64) are also available if needed

Filter terminal query sequences (DSR, OSC, DA, etc.) from captured PTY output
before returning from terminal tool. These queries cause the terminal to respond
when displayed, producing visible escape code garbage.

Root cause: CLI tools like `gh` send terminal queries as part of their
progress/spinner UI. When captured and displayed, the terminal processes
them and responds, causing visible garbage like `^[[38;1R`.

Solution: Add filter_terminal_queries() to strip query sequences while
preserving legitimate formatting codes (colors, bold, etc.).

Fixes: #2244

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

API breakage checks (Griffe)

Result: Passed

Action log

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Failed

Log excerpt (first 1000 characters)
{"asctime": "2026-03-06 12:25:29,963", "levelname": "WARNING", "name": "openhands.agent_server.config", "filename": "config.py", "lineno": 173, "message": "\u26a0\ufe0f OH_SECRET_KEY was not defined. Secrets will not be persisted between restarts."}
::error title=openhands-agent-server REST API::Breaking REST API change detected without MINOR version bump (1.12.0 -> 1.12.0).

Breaking REST API changes detected compared to baseline release:
- the 'file' request property type/format changed from 'string'/'' to 'string'/'binary'
/home/runner/work/software-agent-sdk/software-agent-sdk/.venv/lib/python3.13/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:66: DeprecationWarning: There is no current event loop
  loop = asyncio.get_event_loop()

Action log

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Requires Eval Verification

Taste Rating: Code is clean and solves a real problem. However, this touches terminal output handling and needs eval verification before approval per repo guidelines.

Assessment:

  • ✅ Solves real problem (visible escape code garbage)
  • ✅ Simple, targeted solution (filter at source)
  • ✅ Comprehensive tests with good coverage
  • ⚠️ Touches terminal/stdout handling → flag for lightweight evals

The implementation is solid. Regex patterns are compiled, tests verify both removal and preservation, and the fix is applied at the right layer. Once evals confirm no regressions, this is ready to merge.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-tools/openhands/tools/terminal/terminal
   terminal_session.py1896665%93, 99, 103–105, 132–133, 165, 180–181, 220–222, 227, 230–231, 235, 241, 244, 259–261, 266, 269–270, 274, 280, 283, 303, 305, 308, 310, 326, 341, 347, 356, 359, 393, 397, 400, 403–404, 410–411, 417, 420, 427–428, 434–435, 494–496, 500, 505, 510–511, 515–516, 519–522, 528–529, 532
TOTAL19505987349% 

Change the OSC filter pattern from matching specific codes (10, 11, 4) to
matching any OSC query (sequences ending with ;? before terminator).

This is more future-proof and catches additional query types like:
- OSC 12 (cursor color)
- OSC 17 (highlight background)
- Any other OSC queries that follow the standard format

The pattern now matches: ESC ] Ps [;param] ;? TERMINATOR
Where ;? indicates it's a query, not a set operation.

Importantly, SET operations are preserved:
- OSC 0 (window title)
- OSC 8 (hyperlinks)
- OSC 7 (working directory)

Co-authored-by: openhands <openhands@all-hands.dev>
@jpshackelford
Copy link
Contributor Author

Good catch! I've updated the OSC pattern to be more general.

Before: Matched only OSC codes 10, 11, 4
After: Matches any OSC sequence ending with ;? (the query marker)

The new pattern: ESC ] Ps [;param] ;? TERMINATOR

This catches all OSC queries:

  • ✅ OSC 10/11 (fg/bg color)
  • ✅ OSC 4 (palette)
  • ✅ OSC 12 (cursor color)
  • ✅ OSC 17 (highlight background)
  • ✅ Any future OSC query types

While preserving SET operations:

  • ✅ OSC 0 (window title) - no ;? = preserved
  • ✅ OSC 8 (hyperlinks) - no ;? = preserved
  • ✅ OSC 7 (working directory) - no ;? = preserved

Added 5 new tests to verify the behavior. See commit a499579.

Adds .pr/test_real_world.py that runs an agent with the gh command
to verify terminal query sequences are properly filtered.

Usage:
  LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \
    uv run python .pr/test_real_world.py

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

📁 PR Artifacts Notice

This PR contains a .pr/ directory with PR-specific documents. This directory will be automatically removed when the PR is approved.

For fork PRs: Manual removal is required before merging.

@jpshackelford
Copy link
Contributor Author

Manual Testing Instructions

A real-world test script is available in .pr/test_real_world.py to verify the fix works correctly.

How to Run

# Clone and checkout the branch
git fetch origin fix/terminal-escape-filter-minimal
git checkout fix/terminal-escape-filter-minimal

# Run the test (uses All-Hands LLM proxy)
LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \
    uv run python .pr/test_real_world.py

What the Test Does

  1. Creates an agent with the terminal tool
  2. Asks it to run gh pr list --repo OpenHands/openhands --limit 3
  3. The gh command sends terminal query sequences (DSR, OSC) as part of its spinner UI
  4. With the fix, these queries are filtered from the captured output

Success Criteria

Pass if:

  • NO visible escape codes like ^[[38;1R or rgb:30fb/3708/41af in the output
  • NO garbage appears on your shell prompt after the script exits
  • Colors from gh output are still visible (formatting preserved)

Fail if:

  • You see raw escape sequences in the terminal output
  • Garbage characters appear after the script completes

Without the Fix (for comparison)

To see the problem on main:

git checkout main
LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_API_KEY="$LLM_API_KEY" \
    uv run python .pr/test_real_world.py

You should see escape code garbage like ^[[6n or ]11;? in the output.

@jpshackelford
Copy link
Contributor Author

jpshackelford commented Mar 6, 2026

I have confirmed that this solution works. Note that there is a change in what the agent displays. Since we filter out OSC queries, the cli does not render the gh spinner. I think this an acceptable limitation. (Note that we aren't preventing the spinner from displaying, but when gh doesn't get back terminal query results, it elects not to display the spinner.)

Why the gh Spinner Doesn't Render Properly

The spinner animation in gh (and similar CLI tools) relies on terminal query sequences to function correctly. Here's why:
How Spinners Work

  1. Query cursor position: The spinner sends \x1b[6n (DSR) to ask "where is my cursor?"
  2. Receive response: The terminal responds with \x1b[row;colR
  3. Overwrite in place: Using the cursor position, the spinner moves back and overwrites itself with the next frame (⣾ → ⣽ → ⣻ → etc.)
What Our Filter Does

We filter out DSR queries (\x1b[6n) from the captured output because when they're displayed to the user's terminal, that terminal responds - and the response becomes visible garbage.

The Consequence

Without the cursor position query reaching the terminal:

  • The spinner never learns where it is
  • It can't move back to overwrite itself
  • Each spinner frame may appear on a new line, or the spinner may not animate at all

Why This Is Acceptable

  1. The command still works - gh pr list executes correctly and returns results
  2. Actual output is preserved - The PR list, colors, and formatting are intact
  3. Agent context - Spinners are for human feedback during waits; the agent doesn't need visual progress indicators
  4. The alternative is worse - Without filtering, you get ^[[6n^[[38;1R garbage polluting the output

Why Filter the Query, Not the Response?

The Response Problem

When a terminal query like \x1b[6n is displayed, the user's terminal:

  1. Processes the query
  2. Writes its response to stdin (e.g., \x1b[38;1R)

Filtering the response would require:

  1. Monitoring stdin continuously - Responses arrive asynchronously, potentially long after we've returned output to the agent. We'd need to constantly drain stdin throughout the entire session.
  2. Distinguishing responses from user input - If a user types while the agent is running, their keystrokes arrive on stdin too. How do we know \x1b[A is a terminal response vs. the user pressing the up arrow? We risk eating legitimate input.
  3. Racing against echo - By the time the response arrives on stdin, the terminal may have already echoed it to the display. The visible garbage (^[[38;1R) appears because the terminal echoes the response before we can intercept it. Filtering stdin doesn't prevent the visual pollution - the damage is already done.
  4. Complex terminal mode manipulation - Reliably reading stdin without blocking, while preserving terminal state, while not corrupting user input, across different platforms... this is the path the original PR fix(terminal): filter terminal query sequences from captured output #2245 went down with flush_stdin() - 700+ lines of complexity and it did not work reliably without the OSC filtering.

Why Filtering Queries Is Better

  1. Single point of control - Filter in _get_command_output() before output is returned
  2. No response is ever generated - If the query never reaches the display terminal, there's nothing to clean up
  3. No stdin complexity - No terminal modes, no race conditions, no risk of eating user input
  4. Deterministic and testable - Simple regex on captured output

@jpshackelford
Copy link
Contributor Author

I think this is ready except that should probably test the CLI built against this version of the SDK to ensure that our approach here doesn't interfere with the TUI.

Perhaps the best course is to open a PR that will build the CI against this branch of the SDK and recruit some users to use it for a day or two.

jpshackelford added a commit to OpenHands/OpenHands-CLI that referenced this pull request Mar 6, 2026
This CLI build uses the software-agent-sdk branch from PR #2334 which
includes the terminal escape filter fix for tools like gh, npm that use
spinner/progress UI.

SDK PR: OpenHands/software-agent-sdk#2334

Co-authored-by: openhands <openhands@all-hands.dev>
@jpshackelford
Copy link
Contributor Author

It looks like testing this in the CLI is blocked until breaking change in #2133 is dealt with in the CLI unless we rebase this fix branch on v1.11.5.

@jpshackelford
Copy link
Contributor Author

This PR was merged and should address the blocker on testing with OpenHands-CLI OpenHands/OpenHands-CLI#587

@enyst enyst requested a review from xingyaoww March 9, 2026 18:03
@enyst
Copy link
Collaborator

enyst commented Mar 9, 2026

@OpenHands Do a /codereview-roasted on this PR. Publish your review feedback as review on the PR, using your appropriate event in gh api. (not a comment, a review, you are allowed to review)

@openhands-ai
Copy link

openhands-ai bot commented Mar 9, 2026

I'm on it! enyst can track my progress at all-hands.dev

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Needs improvement

This is aimed at a real bug and the implementation stays pleasantly small, but there are two correctness holes here: the fix only applies to output coming back through TerminalSession, and the filter is stateless so split escape sequences can still leak through incremental updates. Since this also touches terminal/stdout handling, I’d want lightweight eval coverage after those are addressed.

Verdict: not ready as the claimed fix for #2244 yet.

Key insight: sanitizing after the stream has already been split into per-observation chunks is too late; either the sanitization needs carry-over state, or part of the fix has to live at the actual SDK terminal boundary rather than only in the terminal tool output path.


# Filter terminal query sequences that would cause the terminal to
# respond when displayed, producing visible garbage
command_output = filter_terminal_queries(command_output)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only sanitizes bytes that flow back through the terminal tool. Issue #2244 also reproduces when the SDK process itself emits terminal queries (the minimal repro in the issue does that directly, and Rich capability detection is another example). Those paths never touch TerminalSession, so this patch doesn’t actually close the full bug it claims to fix. Either narrow the scope to PTY-emitted queries only, or handle the SDK-side leak at the conversation/visualizer boundary too.

# Convert to bytes for regex matching (escape sequences are byte-level)
output_bytes = output.encode("utf-8", errors="surrogateescape")

# Remove each type of query sequence
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filter is stateless, which means a query split across observations survives unchanged. For example, if one update ends with \u001b]11; and the next starts with ?\u0007, neither call matches, but the client still receives the full OSC query once those chunks are rendered in sequence. That’s not a theoretical edge case here because long-running commands are surfaced incrementally. The fix needs carry-over state for incomplete escape sequences, or it needs to run before the output is sliced into deltas.

@openhands-ai
Copy link

openhands-ai bot commented Mar 9, 2026

Final summary:

  • Reviewed PR fix(terminal): filter terminal query sequences from captured output #2334 against main, the PR description, and linked issue #2244.
  • Submitted a GitHub PR review using the proper COMMENT review event via gh api (not a regular comment).
  • Left two substantive inline concerns:
    1. the fix only sanitizes output in TerminalSession, so it does not fully cover the SDK-side leakage described in #2244
    2. the filter is stateless, so escape sequences split across incremental output chunks can still leak through

Checklist:

  • Performed a roasted code review of the PR
  • Published feedback as a review, not a comment
  • Focused on important correctness issues only
  • Made no code changes
  • No extraneous local modifications to push

Because I did not change repository files, there was nothing to push to the remote branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Terminal escape code responses leak to stdin, corrupting subsequent input

3 participants