Skip to content

fix(terminal): filter terminal query sequences from captured output#2245

Open
jpshackelford wants to merge 10 commits intomainfrom
fix/stdin-escape-code-leak
Open

fix(terminal): filter terminal query sequences from captured output#2245
jpshackelford wants to merge 10 commits intomainfrom
fix/stdin-escape-code-leak

Conversation

@jpshackelford
Copy link
Contributor

@jpshackelford jpshackelford commented Feb 28, 2026

Summary

Fixes #2244 - Filter terminal query sequences from captured PTY output to prevent visible escape code garbage.

Problem

When CLI tools like gh run inside the SDK's PTY, they send terminal query sequences (DSR, OSC 11, etc.) as part of their progress/spinner UI. These queries get captured as output and when displayed, the terminal processes them and responds, causing visible garbage like:

^[[38;1R^[]11;rgb:30fb/3708/41af^G

Root Cause Analysis

The diagnostic script .pr/diagnose_source.py confirmed the queries are IN the captured PTY output, not generated by terminal responses to Rich rendering. The gh command specifically writes:

  • \x1b[6n (DSR - cursor position query)
  • \x1b]11;? (OSC 11 - background color query)

Solution

Added filter_terminal_queries() in the terminal tool to strip query sequences that trigger terminal responses while preserving legitimate formatting codes (colors, bold, etc.):

  • Filtered: DSR, OSC 10/11/4, DA, DA2, DECRQSS
  • Preserved: ANSI colors, cursor movement, bold/formatting

Applied in _get_command_output() before returning to the visualizer.

Manual Testing

Verified with uv run python .pr/test_real_world.py:

  • ✅ No visible escape codes in output
  • ✅ Colors from gh preserved
  • ✅ Clean shell prompt after exit

Files Changed

  • openhands-tools/.../utils/escape_filter.py (NEW) - Filter implementation
  • openhands-tools/.../terminal_session.py - Apply filter in output processing
  • tests/tools/terminal/test_escape_filter.py (NEW) - 16 tests
  • .pr/ - Diagnostic scripts for reviewers

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:704b4a2-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-704b4a2-python \
  ghcr.io/openhands/agent-server:704b4a2-python

All tags pushed for this build

ghcr.io/openhands/agent-server:704b4a2-golang-amd64
ghcr.io/openhands/agent-server:704b4a2-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:704b4a2-golang-arm64
ghcr.io/openhands/agent-server:704b4a2-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:704b4a2-java-amd64
ghcr.io/openhands/agent-server:704b4a2-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:704b4a2-java-arm64
ghcr.io/openhands/agent-server:704b4a2-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:704b4a2-python-amd64
ghcr.io/openhands/agent-server:704b4a2-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:704b4a2-python-arm64
ghcr.io/openhands/agent-server:704b4a2-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:704b4a2-golang
ghcr.io/openhands/agent-server:704b4a2-java
ghcr.io/openhands/agent-server:704b4a2-python

About Multi-Architecture Support

  • Each variant tag (e.g., 704b4a2-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 704b4a2-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Feb 28, 2026

API breakage checks (Griffe)

Result: Failed

Log excerpt (first 1000 characters)

============================================================
Checking openhands-sdk (openhands.sdk)
============================================================
Comparing openhands-sdk 1.11.5 against 1.11.4
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): load_public_skills
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): temperature
::warning file=openhands-sdk/openhands/sdk/llm/llm.py,line=196,title=LLM.top_p::Attribute value was changed: `Field(default=1.0, ge=0, le=1)` -> `Field(default=None, ge=0, le=1, description='Nucleus sampling parameter. Defaults to None (uses provider default). Set to a value between 0 and 1 to control diversity of outputs.')`
::error title=SemVer::Breaking changes detected (1); require at least minor version bump from 1.11.x, but new is 1.11.5

============================================================
Checking openhands-workspace (openhands.workspace)
============================

Action log

@github-actions
Copy link
Contributor

github-actions bot commented Feb 28, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Taste Rating: Acceptable - Core solution is sound and pragmatic, but has test quality issues and scope creep.

Key Insight: The flush_stdin() implementation correctly handles terminal I/O cleanup, but the tests need work and there's an unrelated change mixed in.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 28, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py3422193%282, 287, 315, 358, 376, 392, 454, 603–604, 607, 765, 773, 775, 786, 788–790, 815, 977, 984–985
openhands-sdk/openhands/sdk/conversation/visualizer
   default.py1471689%81–82, 84–89, 91, 120, 140, 151, 162, 272, 309, 346
openhands-sdk/openhands/sdk/logger
   logger.py1773580%35, 61, 66–69, 71–73, 136, 141–143, 146–147, 153–155, 162, 167–168, 250, 317–318, 324, 327, 329–330, 333–334, 374–375, 397, 409–410
openhands-tools/openhands/tools/terminal/terminal
   terminal_session.py1896665%93, 99, 103–105, 132–133, 165, 180–181, 220–222, 227, 230–231, 235, 241, 244, 259–261, 266, 269–270, 274, 280, 283, 303, 305, 308, 310, 326, 341, 347, 356, 359, 393, 397, 400, 403–404, 410–411, 417, 420, 427–428, 434–435, 494–496, 500, 505, 510–511, 515–516, 519–522, 528–529, 532
TOTAL19748607969% 

Add flush_stdin() function to prevent ANSI escape code responses from
terminal queries (like DSR cursor position requests) from leaking to
stdin. This prevents garbage characters appearing in the shell prompt
or corrupting subsequent input() calls in CLI applications.

The fix has three parts:
1. Add flush_stdin() function to the logger module that drains pending
   stdin data using non-blocking reads with termios
2. Call flush_stdin() after each agent step in LocalConversation.run()
3. Call flush_stdin() before rendering in DefaultConversationVisualizer
4. Register flush_stdin() with atexit for final cleanup

The function gracefully handles:
- Non-TTY environments (CI, piped commands)
- Windows (where termios is not available)
- Various error conditions (OSError, termios.error)

Fixes #2244

Co-authored-by: openhands <openhands@all-hands.dev>
@jpshackelford jpshackelford force-pushed the fix/stdin-escape-code-leak branch from a05c3c8 to 8730c48 Compare February 28, 2026 17:43
Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Contributor Author

Note on API Breakage Check Comment

The API breakage comment above shows "Failed" but this is misleading and not blocking.

What's happening:

  • The Griffe check compares this PR against the last released version (v1.11.4), not main
  • PR fix: override server_image default to None in DockerDevWorkspace #2243 merged a server_image default change to main, but this change is not yet in a release
  • So Griffe sees a "breaking change" when comparing against v1.11.4, even though this PR doesn't introduce that change

Why it's not blocking:

  • The actual CI check shows ✅ passing (see PR checks above)
  • The workflow has continue-on-error: true for non-release PRs
  • This PR was rebased onto main and no longer contains the dev_workspace.py change

Resolution:

The comment will auto-correct once a new version is released that includes the server_image change from PR #2243.

- Replace useless Windows test that mocked the function being tested
  with a proper skip that documents Windows CI covers this path
- Replace weak import-only integration tests with actual behavioral tests:
  - Test that flush_stdin is called in visualizer.on_event()
  - Test that atexit handler is registered
- Consolidate importlib.util import at module level

Addresses review feedback on PR #2245

Co-authored-by: openhands <openhands@all-hands.dev>
@jpshackelford jpshackelford marked this pull request as draft February 28, 2026 19:14
@jpshackelford jpshackelford marked this pull request as ready for review February 28, 2026 19:14
@jpshackelford
Copy link
Contributor Author

@juanmichelini This PR doesn't change prompts but it does impact the way OpenHands manipulates shell commands and tools and I wouldn't want it to have any kind of negative impacts on our benchmarks. Not sure if you have a process for watching higher risk PRs or doing light weight evals on them. You'd definitely have a better eye than me for what could create issues...

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Taste Rating: Acceptable - Pragmatic fix for a real terminal I/O issue.

Key Insight: Simple, defensive implementation that solves the escape code leak problem using standard Unix patterns. Clean error handling and proper resource cleanup.

Verdict: ✅ LGTM - The implementation is solid, well-documented, and handles edge cases correctly. All previous review concerns have been addressed.

@enyst
Copy link
Collaborator

enyst commented Feb 28, 2026

@OpenHands Understand this PR and the linked issue it aims to solve, assess the issue yourself. Do a /codereview-roasted on this PR.

@openhands-ai
Copy link

openhands-ai bot commented Feb 28, 2026

I'm on it! enyst can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Feb 28, 2026

Taste rating: RED (needs improvement)

This is a real bug (terminal DSR/OSC query responses landing in the TTY input buffer), and the general “drain pending stdin bytes” approach is pragmatic. But the current implementation has one outright correctness bug in the termios handling, plus a couple of design choices that risk eating legitimate user input and introducing surprising import-time side effects.


[CRITICAL ISSUES] (must fix)

  • [openhands-sdk/openhands/sdk/logger/logger.py, Lines 241–259] Termios restore is broken (shallow copy mutates old)

    • You do:
      • old = termios.tcgetattr(_sys.stdin) (a list where index 6 is a nested list of control chars)
      • new = list(old) (shallow copy)
      • then mutate new[6][VMIN] and new[6][VTIME]
    • Because new[6] is old[6], you’re also mutating old. That means your “restore” call tcsetattr(..., old) does not actually restore the original cc settings.
    • Concrete fix: deep copy the cc array (new = old.copy(); new[6] = old[6].copy()), or build the new attrs without sharing nested structures.
  • [openhands-sdk/openhands/sdk/conversation/visualizer/default.py, Lines 252–257] flush_stdin() before every render can drop real user input

    • This flush drains all pending stdin bytes. That includes not only terminal query responses, but also typeahead (user starts typing while the agent is running) and any escape sequences from user keypresses (arrow keys, alt combos).
    • In an interactive CLI, silently deleting user keystrokes is a nasty UX regression.
    • Safer design: flush only right before you call input() / read from stdin for a prompt, and keep the atexit flush for the “garbage after exit” case. Doing it on every event render is overkill and increases the chance of eating input.

[IMPROVEMENT OPPORTUNITIES] (should fix)

  • [openhands-sdk/openhands/sdk/logger/logger.py, Lines 263–266] Import-time global side effect (registering atexit)

    • Importing openhands.sdk.logger now always registers an exit handler. That’s not catastrophic, but it is a surprising side effect for a library import (and now DefaultConversationVisualizer imports the logger module too).
    • Consider registering the atexit handler in a more explicit “interactive/CLI setup” path, or at least ensuring it doesn’t pull in heavy deps unnecessarily.
  • [openhands-sdk/openhands/sdk/conversation/visualizer/default.py, Line 23] Pulling in the logger module just to flush stdin

    • openhands.sdk.logger imports the full logger stack (and litellm in logger.py). That’s a big hammer to swing from a visualizer module that otherwise just prints stuff.
    • Better separation: put flush_stdin() in a lightweight terminal/tty utility module with minimal imports, and have both logger + visualizer depend on that.
  • [tests/sdk/logger/test_flush_stdin.py, overall] Tests don’t actually prove the core behavior

    • Current tests mostly prove “returns 0 in a couple of early-exit/error cases” and that something calls the function.
    • Missing: a test that flush_stdin() actually drains bytes and returns the correct count.
      • On Linux/macOS you can do this with a pty pair: make the slave the “stdin” (isatty=True), write bytes to the master, and assert flush_stdin() consumes them.
    • Also missing: a test that would catch the shallow-copy bug above (verify the “restore” attrs equal the original tcgetattr result).
  • [tests/sdk/logger/test_flush_stdin.py, Lines 24–40] “termios unavailable” test is basically dead code on Unix

    • The docstring claims “simulate absence by patching the import mechanism”, but the test just skips on Unix.
    • If you want this covered on Unix too, patch builtins.__import__ (or similar) to raise ImportError for termios during the call.

[BREAKING CHANGE / BEHAVIOR RISK] (what might you break?)

  • [local_conversation.py, Line 628 + visualizer/default.py, Line 256] New flushing points can change interactive behavior
    • Even if the intent is correct, the “flush frequently” strategy risks deleting legitimate input, which is effectively a behavior-breaking change for interactive clients.

[SECURITY / CORRECTNESS] (only the real stuff)

  • No obvious “security exploit” here, but dropping user input is a correctness problem: it can cause mis-confirmations / failed prompts / confusing behavior in CLI tools.

VERDICT

Needs rework before I’d be comfortable calling this “done.” The overall approach is fine, but the termios restore bug is a hard correctness failure, and the flush-call placement (especially in the visualizer) is too aggressive for interactive UX.

Key insight: the hard part isn’t “drain stdin” — it’s drain only when it’s safe, and don’t accidentally leave the terminal in a modified state.

@enyst
Copy link
Collaborator

enyst commented Feb 28, 2026

Not sure if you have a process for watching higher risk PRs or doing light weight evals on them.

We typically figure it out at review time, and yes, we used to run some ~50 instances evals on them. This is a good point... the reviewer agent doesn't know.

@OpenHands Read this #2245 (comment) and look up the workflow running the pr-review agent. Understand what prompt file the reviewer uses. Investigate the agent loop in the SDK, and add to the reviewer prompt to not approve PRs that change behavior in a way that risks evaluation performance, leave those to human maintainers.

@jpshackelford
Copy link
Contributor Author

@enyst You must be using a different model for your roasted review : )

@openhands-ai
Copy link

openhands-ai bot commented Feb 28, 2026

I'm on it! enyst can track my progress at all-hands.dev

@enyst
Copy link
Collaborator

enyst commented Feb 28, 2026

@enyst You must be using a different model for your roasted review : )

Yes! That's why I do it! GPT-5.2, as safety net for over-happy Claudes. Granted, lately it probably exaggerates in the other direction, but that's why we haz a use for humans. 😅

@jpshackelford
Copy link
Contributor Author

@enyst Give me a signal when you are done looking at it and I'll address the roasted issues and any others.

@openhands-ai
Copy link

openhands-ai bot commented Feb 28, 2026

OpenHands encountered an error: **Failed to send message to agent server: HTTP 503 error: no available server
**.

See the conversationfor more information.

@enyst
Copy link
Collaborator

enyst commented Feb 28, 2026

@enyst Give me a signal when you are done looking at it and I'll address the roasted issues and any others.

All good, the agent took the proposed reviewer agent changes to another PR.

Re: the roasted review. If you want, we could also reuse that conversation to ask it for a re-assessment of its own roasted review, sometimes it figures out where it exaggerated (or not).

Actually let me try it

@enyst
Copy link
Collaborator

enyst commented Feb 28, 2026

(OpenHands-GPT-5.2; note that in general, it's so prompt-sensitive that once I used words like "re-assessment" it will strongly push it to give a different answer/framing; still the core stuff should(tm) be apparent)


Re-evaluation of my review (section by section)

Taste rating

After thinking harder: still “needs improvement,” but closer to “yellow/acceptable with a couple fixes” than “red dumpster fire.” The core idea is fine; the main question is whether the implementation is safe and provably correct.


[CRITICAL ISSUES] — are they really critical?

1) Termios restore bug due to shallow copy

Claim: [logger.py L241–259] new = list(old) shares old[6], so mutating new[6][VMIN/VTIME] mutates old too, so the restore doesn’t restore.

Is it really the case? Yes. termios.tcgetattr() returns a list where cc is a nested list/array at index 6. A shallow copy (list(old) / old.copy()) will still share that nested cc. So new[6][...] = ... mutates old[6].

Is it as major as I made it sound? Slightly nuanced:

  • In the common case (terminal in canonical mode, ICANON on), VMIN/VTIME aren’t used, so leaving them at 0 typically won’t break basic input() behavior.
  • But in non-canonical situations (prompt_toolkit, curses-ish apps, anything that already turned off ICANON), you’ve now silently altered the terminal’s read behavior by “restoring” mutated settings. That’s the kind of thing that creates “random” terminal weirdness.

Conclusion: Still a real correctness bug and still worth treating as must-fix, mostly because it’s trivial to fix and avoids terminal-state corruption in edge-but-real environments.


2) “flush before render can drop real user input”

Claim: [visualizer default.py L252–257] flushing stdin before rendering can discard typeahead / real keystrokes.

Is it really the case? Yes. The function drains whatever bytes are pending on stdin. If the user types while output is happening (typeahead), those bytes are indistinguishable from terminal query responses at this level and will be eaten.

Is it as major as I made it sound? This one depends on product expectations:

  • If the SDK’s intended UX is “agent runs, user waits, then user types when prompted,” then losing typeahead is less likely, and some projects accept that tradeoff.
  • If users do type ahead (or you embed this in an app that does concurrent reading, or you’re in a rich TUI-ish environment), silently discarding input is a real UX regression.

Also: calling flush_stdin() before every visualizer render increases the odds of catching typeahead compared to calling it at a single “safe boundary” (like immediately before prompting).

Conclusion: The risk is real; I’d downgrade it from “definitely critical everywhere” to high-risk behavior change that needs justification or a narrower call site. The “right” fix depends on intended interaction model.


[IMPROVEMENT OPPORTUNITIES] — did I overstate them?

3) Import-time atexit side effect

Is it really bad? It’s a side effect, but lots of libraries do import-time setup. Here, the SDK already does import-time things (e.g., logging auto-config). So this is not catastrophic.

Severity after reflection: Minor-to-moderate. Worth noting, but not a blocker by itself.

4) Visualizer importing openhands.sdk.logger pulls in litellm etc.

Is it really the case? Yes: openhands.sdk.logger.__init__ imports from .logger, which imports litellm.

Is it major? Probably minor in this repo because litellm is already a core dependency for the SDK runtime. The cleanliness argument (“flush_stdin should live in a lightweight tty util”) still stands, but it’s not a correctness issue.


[TESTING GAPS] — did I demand too much?

5) Tests don’t prove actual draining / don’t catch the shallow-copy restore bug

Is it really the case? Yes:

  • The tests validate early returns and that calls happen.
  • They don’t validate “bytes were present and got drained,” nor do they validate that termios settings are restored correctly (which would catch the shallow-copy issue).

Is this blocking? I’d say not strictly blocking if maintainers are OK relying on manual reproduction + the small surface area of the function. But for a terminal/tty fix (notoriously environment-sensitive), a single robust PTY-based test would materially increase confidence.

Severity after reflection: Moderate (nice to have, but I wouldn’t hold up merging solely on this if the core code is fixed).


Bottom line (after deeper thought)

  • The termios shallow-copy/restore issue is still the one I’m most confident is objectively wrong and worth fixing.
  • The “flush on every render” concern is real, but its severity depends on whether the SDK promises to preserve typeahead / interactive keystrokes during agent output. I still think it’s the highest “behavior change” risk in the PR.
  • The other points (atexit registration, heavy import) are real but secondary.
  • The tests are thin for the main behavior, but that’s more about confidence than correctness.

If you want, I can also sanity-check where and when user input is actually read in the SDK flow to better judge whether “typeahead loss” is a real practical regression for typical clients, or mostly theoretical.

@enyst
Copy link
Collaborator

enyst commented Feb 28, 2026

(HUMAN) Last note:

If users do type ahead (or you embed this in an app that does concurrent reading, or you’re in a rich TUI-ish environment), silently discarding input is a real UX regression.

We support user messages sent at any time, including during agent's run, for sure.

All yours!

@jpshackelford
Copy link
Contributor Author

Manual Testing Results on macOS

Testing Approach

Tested this PR branch on macOS using the reproduction scripts from the linked gist.

Installation command:

uv run --with "openhands-sdk @ git+https://github.com/OpenHands/software-agent-sdk.git@fix/stdin-escape-code-leak#subdirectory=openhands-sdk" \
       --with openhands-tools python repro_real.py

Results

Test Criteria Result
Garbage in shell prompt after exit None observed
!!! LEAK DETECTED message Not triggered
Escape codes visible inline during execution ⚠️ Still present

Observed inline escape codes:

^[]11;rgb:30fb/3708/41af^G^[[57;1R

These are:

  • OSC 11 responses (terminal background color)
  • DSR cursor position responses

Assessment

The fix successfully addresses the main bug:

  • ✅ No stdin corruption affecting subsequent input() calls
  • ✅ No garbage characters leaking to shell prompt after script exits
  • flush_stdin() is consuming escape code responses before they can leak

Minor cosmetic issue remaining: Escape codes still appear inline in the terminal output between the Observation and Agent Action rendering. This appears to be a timing issue where terminal responses are echoed to stdout before flush_stdin() consumes them.

Next Steps

  1. This PR can be merged - it fixes the critical stdin leak bug as designed
  2. Consider follow-up enhancement: Add flush_stdin() calls at additional points to prevent inline escape code visibility
  3. The inline escape codes are cosmetic and do not affect functionality

@jpshackelford
Copy link
Contributor Author

Manual Testing Results on macOS

Testing Approach

Tested this PR branch on macOS using the reproduction scripts from the linked gist.

Installation command:

uv run --with "openhands-sdk @ git+https://github.com/OpenHands/software-agent-sdk.git@fix/stdin-escape-code-leak#subdirectory=openhands-sdk" \
       --with openhands-tools python repro_real.py

Results

Test Criteria Result
Garbage in shell prompt after exit None observed
!!! LEAK DETECTED message Not triggered
Escape codes visible inline during execution ⚠️ Still present

Observed inline escape codes:

^[]11;rgb:30fb/3708/41af^G^[[57;1R

These are:

  • OSC 11 responses (terminal background color)
  • DSR cursor position responses

Assessment

The fix successfully addresses the main stdin leak bug:

  • ✅ No stdin corruption affecting subsequent input() calls
  • ✅ No garbage characters leaking to shell prompt after script exits

However, escape codes still appear inline in the terminal output between the Observation and Agent Action rendering.

Suggested Next Steps

The inline escape codes appear to be a timing issue - terminal responses are being echoed to stdout before flush_stdin() can consume them. To address this:

  1. Identify the source of terminal queries - The OSC 11 (background color) and DSR queries are likely coming from Rich's terminal capability detection. Adding logging or breakpoints around flush_stdin() calls could help pinpoint when queries are sent vs when responses arrive.

  2. Consider adding flush_stdin() before Rich renders - Currently the PR calls flush_stdin() after agent steps and before rendering in the visualizer. It may help to also call it immediately before Rich prints output, to catch any responses that arrived during processing.

  3. Alternative: suppress terminal queries - Rich can be configured to skip terminal capability detection. Setting TERM=dumb or using Rich's force_terminal=False might prevent the queries entirely, though this could affect output formatting.

The previous implementation used list(old) which creates a shallow copy.
Since old[6] (the cc array) is itself a list, both old[6] and new[6]
pointed to the same object. Modifying new[6][VMIN] and new[6][VTIME]
also corrupted old[6], making the restore at the end ineffective.

This fix uses a comprehension that copies nested lists, ensuring the
original termios settings are preserved for proper restoration.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Contributor Author

Fix for termios shallow-copy bug

Based on the detailed analysis, I've pushed a fix for the termios shallow-copy issue identified in the review.

The Problem

The original code used list(old) which creates a shallow copy:

old = termios.tcgetattr(_sys.stdin)
new = list(old)  # ← Shallow copy
new[6][termios.VMIN] = 0  # ← Also modifies old[6]!

Since old[6] (the cc array) is itself a list, both old[6] and new[6] pointed to the same object. Modifying new[6][VMIN] also corrupted old[6], making the restore at the end ineffective.

Verified with a quick test:

>>> old[6] is new[6]
True  # Same object!
>>> new[6][termios.VMIN] = 99
>>> old[6][termios.VMIN]
99  # Oops, old is corrupted too

The Fix (commit ae64c17)

Changed to a proper deep copy using a comprehension that copies nested lists:

new = [item[:] if isinstance(item, list) else item for item in old]

Now old[6] and new[6] are separate objects, and the termios restore works correctly.


This addresses the "termios shallow-copy/restore" concern from the review. The typeahead loss question is still an open design discussion that needs product input.

jpshackelford and others added 2 commits February 28, 2026 15:25
Add two new tests using pseudo-terminals to verify real terminal behavior:

1. test_flush_stdin_restores_termios_settings: Verifies that VMIN/VTIME
   settings in the cc array are properly restored after flush_stdin.
   This test would have caught the shallow-copy bug.

2. test_flush_stdin_drains_pending_data: Verifies that flush_stdin
   actually reads and discards pending escape code data from stdin.

These tests are skipped on Windows where the pty module is unavailable.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Contributor Author

Added PTY-based tests (commit a8b4af9)

Added two new tests using pseudo-terminals to verify real terminal behavior:

1. test_flush_stdin_restores_termios_settings

Verifies that VMIN/VTIME settings in the cc array are properly restored after flush_stdin(). This test would have caught the shallow-copy bug - if we revert the deep-copy fix, this test fails.

2. test_flush_stdin_drains_pending_data

Verifies that flush_stdin() actually reads and discards pending escape code data from stdin, returning the correct byte count.

Both tests:

  • Use pty.openpty() to create a real terminal environment
  • Are skipped on Windows where the pty module is unavailable
  • Test actual behavior, not mocks

Remaining open item: The typeahead loss concern still needs a product decision.

Copy link
Contributor Author

Typeahead Concern - Problem & Proposed Direction

Problem: flush_stdin() drains ALL pending stdin data, which would discard legitimate user input typed during agent execution.

Constraints:

  • ❌ Cannot accept discarding user typeahead - SDK supports messages sent anytime during agent run
  • ❌ Cannot limit fix to only atexit - escape codes leak during execution, not just at exit

Rejected approaches:

  • Flush everything (current impl) - loses user input
  • Only flush at exit - doesn't fix mid-execution leaks
  • TIOCSTI re-injection - disabled on modern Linux kernels for security

Proposed direction: Selective flushing

  • Parse pending stdin data
  • Discard only recognized escape sequences (CSI \x1b[..., OSC \x1b]...)
  • Buffer any other data (likely user input)
  • Export get_buffered_input() for SDK to prepend to next input read

Will prototype this approach after addressing CI failures.

- Fix line too long in test assertion message
- Fix 'possibly unbound' variable warnings by initializing slave_file
- Fix pyright type error by extracting lflag with type assertion

Co-authored-by: openhands <openhands@all-hands.dev>
The previous implementation drained ALL pending stdin data, which could
discard legitimate user typeahead. This update implements selective flushing:

- Parse stdin data byte-by-byte, identifying escape sequences
- Discard only recognized CSI (\x1b[...) and OSC (\x1b]...) sequences
- Preserve all other data in a module-level buffer
- Add get_buffered_input() for SDK to retrieve preserved input
- Add clear_buffered_input() to explicitly clear the buffer

New helper functions:
- _is_csi_final_byte(): Check if byte is CSI sequence terminator
- _find_csi_end(): Find end of CSI sequence (or detect incomplete)
- _find_osc_end(): Find end of OSC sequence with BEL or ST terminator
- _parse_stdin_data(): Separate escape sequences from user input

Tests added for all parsing functions and the new buffer API.

Addresses typeahead concern from PR review:
#2245 (comment)

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Contributor Author

Selective Flushing Implementation (commit 5c34edb)

Based on the typeahead concern discussion, I've implemented selective flushing that preserves user input while discarding terminal escape sequences.

How It Works

The new implementation parses stdin data byte-by-byte:

  1. CSI sequences (\x1b[...X) - Identified by looking for the final byte (0x40-0x7E per ECMA-48). Complete sequences are discarded; incomplete ones are preserved.

  2. OSC sequences (\x1b]...\x07 or \x1b]...\x1b\\) - Terminated by BEL or ST. Complete sequences are discarded; incomplete ones are preserved.

  3. Other data - All other bytes (including partial sequences) are preserved in a module-level buffer.

New Public API

from openhands.sdk.logger import get_buffered_input, clear_buffered_input

# Retrieve preserved user input (clears buffer)
buffered = get_buffered_input()

# Explicitly clear the buffer without retrieving
clear_buffered_input()

SDK Integration Point

Where the SDK reads user input (e.g., before input() calls), it should prepend any buffered data:

buffered = get_buffered_input()
user_input = buffered.decode('utf-8', errors='replace') + input()

Test Coverage

Added comprehensive tests for:

  • CSI final byte detection
  • CSI/OSC sequence end finding (complete and incomplete)
  • Mixed content parsing (escape sequences + user input)
  • Buffer retrieval and clearing
  • PTY-based integration tests verifying selective behavior

Design Notes

  • Incomplete sequences: If an escape sequence is incomplete at the end of a buffer read, it's preserved rather than discarded. This handles the edge case where a user presses Escape and we catch the data before they finish typing.

  • Arrow keys: Complete CSI sequences like arrow keys (\x1b[A) are flushed. This is acceptable because arrow keys typed during agent execution aren't meaningful input. If this becomes an issue, we could add a whitelist.

  • Buffer size: No explicit limit on preserved input buffer. In practice, user typeahead during agent execution should be minimal.

@jpshackelford
Copy link
Contributor Author

Next steps, another round of manual testing.

@all-hands-bot
Copy link
Collaborator

[Automatic Post]: This PR seems to be currently waiting for review. @enyst @juanmichelini, could you please take a look when you have a chance?

jpshackelford and others added 2 commits March 4, 2026 07:34
CLI tools like `gh` send terminal query sequences (DSR, OSC 11, etc.)
as part of their progress/spinner UI. When running inside a PTY, these
queries get captured as output. When displayed, the terminal processes
them and responds, causing visible escape code garbage.

Root cause analysis:
- Diagnostic script confirmed queries are IN the captured PTY output,
  not generated by terminal responses
- The `gh` command writes cursor position and background color queries

Solution:
- Add filter_terminal_queries() to strip query sequences that trigger
  terminal responses while preserving legitimate formatting codes
- Apply filter in _get_command_output() before returning to visualizer
- Queries filtered: DSR, OSC 10/11/4, DA, DA2, DECRQSS
- Preserved: ANSI colors, cursor movement, bold/formatting

Fixes: #2244

Co-authored-by: openhands <openhands@all-hands.dev>
These scripts helped identify the root cause of #2244:
- diagnose_source.py: Proves escape codes are IN captured PTY output
- diagnose_leak.py: Tests stdin flushing (first attempted fix)
- diagnose_echo.py: Tests echo suppression (second attempted fix)
- test_real_world.py: End-to-end test with agent
- manual-testing.md: Testing notes and findings

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

📁 PR Artifacts Notice

This PR contains a .pr/ directory with PR-specific documents. This directory will be automatically removed when the PR is approved.

For fork PRs: Manual removal is required before merging.

@jpshackelford jpshackelford changed the title Fix terminal escape code leak from stdin fix(terminal): filter terminal query sequences from captured output Mar 4, 2026
@jpshackelford
Copy link
Contributor Author

Manual Testing Report

Test Environment

  • macOS 15.6.1
  • Local SDK development environment with uv run
  • Terminal: iTerm2

Root Cause Discovery

The diagnostic script .pr/diagnose_source.py was key to identifying the root cause. Running it captured the raw PTY output:

Raw captured output (repr):
'\x1b]11;?\x1b\\\x1b[6n\x1b]11;?\x1b\\\x1b[6n\nShowing 3 of 183 open pull requests...'

Finding: The gh command writes terminal query sequences directly to its stdout as part of its spinner UI:

  • \x1b[6n - DSR (cursor position query)
  • \x1b]11;? - OSC 11 (background color query)

These get captured by the PTY and when displayed, the terminal responds to them, causing visible garbage.

Fix Verification

After implementing the filter, ran uv run python .pr/test_real_world.py:

  • No visible escape codes in output during conversation
  • Colors preserved - gh output still has colored PR IDs, branches
  • Clean shell prompt after script exits - no leftover garbage

Previous Attempts (documented in .pr/)

  1. flush_stdin() - Flushed pending input but didn't help (queries are in stdout, not stdin responses)
  2. Echo suppression - Disabled terminal echo but didn't help (queries already captured before display)
  3. Filter queries ✅ - Strips query sequences from captured output before display

The diagnostic scripts in .pr/ document this investigation for future reference.

@jpshackelford
Copy link
Contributor Author

Next steps. Determine if we can reduce the scope of the code committed PR and still resolve the issue. We may end up closing this PR and opening an alternative if we can simplify. Stay tuned.

@jpshackelford
Copy link
Contributor Author

Next step test and refine #2334 as a more targeted solution. If that PR fixes the problem, we will close this one without merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Terminal escape code responses leak to stdin, corrupting subsequent input

4 participants