fix(terminal): filter terminal query sequences from captured output by jpshackelford · Pull Request #2245 · OpenHands/software-agent-sdk

jpshackelford · 2026-02-28T17:27:08Z

Summary

Fixes #2244 - Filter terminal query sequences from captured PTY output to prevent visible escape code garbage.

Problem

When CLI tools like gh run inside the SDK's PTY, they send terminal query sequences (DSR, OSC 11, etc.) as part of their progress/spinner UI. These queries get captured as output and when displayed, the terminal processes them and responds, causing visible garbage like:

^[[38;1R^[]11;rgb:30fb/3708/41af^G

Root Cause Analysis

The diagnostic script .pr/diagnose_source.py confirmed the queries are IN the captured PTY output, not generated by terminal responses to Rich rendering. The gh command specifically writes:

\x1b[6n (DSR - cursor position query)
\x1b]11;? (OSC 11 - background color query)

Solution

Added filter_terminal_queries() in the terminal tool to strip query sequences that trigger terminal responses while preserving legitimate formatting codes (colors, bold, etc.):

Filtered: DSR, OSC 10/11/4, DA, DA2, DECRQSS
Preserved: ANSI colors, cursor movement, bold/formatting

Applied in _get_command_output() before returning to the visualizer.

Manual Testing

Verified with uv run python .pr/test_real_world.py:

✅ No visible escape codes in output
✅ Colors from gh preserved
✅ Clean shell prompt after exit

Files Changed

openhands-tools/.../utils/escape_filter.py (NEW) - Filter implementation
openhands-tools/.../terminal_session.py - Apply filter in output processing
tests/tools/terminal/test_escape_filter.py (NEW) - 16 tests
.pr/ - Diagnostic scripts for reviewers

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:704b4a2-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-704b4a2-python \
  ghcr.io/openhands/agent-server:704b4a2-python

All tags pushed for this build

ghcr.io/openhands/agent-server:704b4a2-golang-amd64
ghcr.io/openhands/agent-server:704b4a2-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:704b4a2-golang-arm64
ghcr.io/openhands/agent-server:704b4a2-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:704b4a2-java-amd64
ghcr.io/openhands/agent-server:704b4a2-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:704b4a2-java-arm64
ghcr.io/openhands/agent-server:704b4a2-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:704b4a2-python-amd64
ghcr.io/openhands/agent-server:704b4a2-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:704b4a2-python-arm64
ghcr.io/openhands/agent-server:704b4a2-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:704b4a2-golang
ghcr.io/openhands/agent-server:704b4a2-java
ghcr.io/openhands/agent-server:704b4a2-python

About Multi-Architecture Support

Each variant tag (e.g., 704b4a2-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 704b4a2-python-amd64) are also available if needed

github-actions · 2026-02-28T17:27:32Z

API breakage checks (Griffe)

Result: Failed

Log excerpt (first 1000 characters)


============================================================
Checking openhands-sdk (openhands.sdk)
============================================================
Comparing openhands-sdk 1.11.5 against 1.11.4
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): load_public_skills
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): temperature
::warning file=openhands-sdk/openhands/sdk/llm/llm.py,line=196,title=LLM.top_p::Attribute value was changed: `Field(default=1.0, ge=0, le=1)` -> `Field(default=None, ge=0, le=1, description='Nucleus sampling parameter. Defaults to None (uses provider default). Set to a value between 0 and 1 to control diversity of outputs.')`
::error title=SemVer::Breaking changes detected (1); require at least minor version bump from 1.11.x, but new is 1.11.5

============================================================
Checking openhands-workspace (openhands.workspace)
============================

Action log

github-actions · 2026-02-28T17:27:48Z

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

all-hands-bot

🟡 Taste Rating: Acceptable - Core solution is sound and pragmatic, but has test quality issues and scope creep.

Key Insight: The flush_stdin() implementation correctly handles terminal I/O cleanup, but the tests need work and there's an unrelated change mixed in.

tests/sdk/logger/test_flush_stdin.py

openhands-workspace/openhands/workspace/docker/dev_workspace.py

openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py

tests/sdk/logger/test_flush_stdin.py

openhands-sdk/openhands/sdk/logger/logger.py

github-actions · 2026-02-28T17:29:22Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/conversation/impl
local_conversation.py	342	21	93%	282, 287, 315, 358, 376, 392, 454, 603–604, 607, 765, 773, 775, 786, 788–790, 815, 977, 984–985
openhands-sdk/openhands/sdk/conversation/visualizer
default.py	147	16	89%	81–82, 84–89, 91, 120, 140, 151, 162, 272, 309, 346
openhands-sdk/openhands/sdk/logger
logger.py	177	35	80%	35, 61, 66–69, 71–73, 136, 141–143, 146–147, 153–155, 162, 167–168, 250, 317–318, 324, 327, 329–330, 333–334, 374–375, 397, 409–410
openhands-tools/openhands/tools/terminal/terminal
terminal_session.py	189	66	65%	93, 99, 103–105, 132–133, 165, 180–181, 220–222, 227, 230–231, 235, 241, 244, 259–261, 266, 269–270, 274, 280, 283, 303, 305, 308, 310, 326, 341, 347, 356, 359, 393, 397, 400, 403–404, 410–411, 417, 420, 427–428, 434–435, 494–496, 500, 505, 510–511, 515–516, 519–522, 528–529, 532
TOTAL	19748	6079	69%

Add flush_stdin() function to prevent ANSI escape code responses from terminal queries (like DSR cursor position requests) from leaking to stdin. This prevents garbage characters appearing in the shell prompt or corrupting subsequent input() calls in CLI applications. The fix has three parts: 1. Add flush_stdin() function to the logger module that drains pending stdin data using non-blocking reads with termios 2. Call flush_stdin() after each agent step in LocalConversation.run() 3. Call flush_stdin() before rendering in DefaultConversationVisualizer 4. Register flush_stdin() with atexit for final cleanup The function gracefully handles: - Non-TTY environments (CI, piped commands) - Windows (where termios is not available) - Various error conditions (OSError, termios.error) Fixes #2244 Co-authored-by: openhands <openhands@all-hands.dev>

Co-authored-by: openhands <openhands@all-hands.dev>

jpshackelford · 2026-02-28T18:50:08Z

Note on API Breakage Check Comment

The API breakage comment above shows "Failed" but this is misleading and not blocking.

What's happening:

The Griffe check compares this PR against the last released version (v1.11.4), not main
PR fix: override server_image default to None in DockerDevWorkspace #2243 merged a server_image default change to main, but this change is not yet in a release
So Griffe sees a "breaking change" when comparing against v1.11.4, even though this PR doesn't introduce that change

Why it's not blocking:

The actual CI check shows ✅ passing (see PR checks above)
The workflow has continue-on-error: true for non-release PRs
This PR was rebased onto main and no longer contains the dev_workspace.py change

Resolution:

The comment will auto-correct once a new version is released that includes the server_image change from PR #2243.

- Replace useless Windows test that mocked the function being tested with a proper skip that documents Windows CI covers this path - Replace weak import-only integration tests with actual behavioral tests: - Test that flush_stdin is called in visualizer.on_event() - Test that atexit handler is registered - Consolidate importlib.util import at module level Addresses review feedback on PR #2245 Co-authored-by: openhands <openhands@all-hands.dev>

jpshackelford · 2026-02-28T19:16:52Z

@juanmichelini This PR doesn't change prompts but it does impact the way OpenHands manipulates shell commands and tools and I wouldn't want it to have any kind of negative impacts on our benchmarks. Not sure if you have a process for watching higher risk PRs or doing light weight evals on them. You'd definitely have a better eye than me for what could create issues...

all-hands-bot

🟡 Taste Rating: Acceptable - Pragmatic fix for a real terminal I/O issue.

Key Insight: Simple, defensive implementation that solves the escape code leak problem using standard Unix patterns. Clean error handling and proper resource cleanup.

Verdict: ✅ LGTM - The implementation is solid, well-documented, and handles edge cases correctly. All previous review concerns have been addressed.

enyst · 2026-02-28T19:24:03Z

@OpenHands Understand this PR and the linked issue it aims to solve, assess the issue yourself. Do a /codereview-roasted on this PR.

openhands-ai · 2026-02-28T19:24:23Z

I'm on it! enyst can track my progress at all-hands.dev

openhands-ai · 2026-02-28T19:28:06Z

Taste rating: RED (needs improvement)

This is a real bug (terminal DSR/OSC query responses landing in the TTY input buffer), and the general “drain pending stdin bytes” approach is pragmatic. But the current implementation has one outright correctness bug in the termios handling, plus a couple of design choices that risk eating legitimate user input and introducing surprising import-time side effects.

[CRITICAL ISSUES] (must fix)

[openhands-sdk/openhands/sdk/logger/logger.py, Lines 241–259] Termios restore is broken (shallow copy mutates old)
- You do:
  - old = termios.tcgetattr(_sys.stdin) (a list where index 6 is a nested list of control chars)
  - new = list(old) (shallow copy)
  - then mutate new[6][VMIN] and new[6][VTIME]
- Because new[6] is old[6], you’re also mutating old. That means your “restore” call tcsetattr(..., old) does not actually restore the original cc settings.
- Concrete fix: deep copy the cc array (new = old.copy(); new[6] = old[6].copy()), or build the new attrs without sharing nested structures.
[openhands-sdk/openhands/sdk/conversation/visualizer/default.py, Lines 252–257] flush_stdin() before every render can drop real user input
- This flush drains all pending stdin bytes. That includes not only terminal query responses, but also typeahead (user starts typing while the agent is running) and any escape sequences from user keypresses (arrow keys, alt combos).
- In an interactive CLI, silently deleting user keystrokes is a nasty UX regression.
- Safer design: flush only right before you call input() / read from stdin for a prompt, and keep the atexit flush for the “garbage after exit” case. Doing it on every event render is overkill and increases the chance of eating input.

[IMPROVEMENT OPPORTUNITIES] (should fix)

[openhands-sdk/openhands/sdk/logger/logger.py, Lines 263–266] Import-time global side effect (registering atexit)
- Importing openhands.sdk.logger now always registers an exit handler. That’s not catastrophic, but it is a surprising side effect for a library import (and now DefaultConversationVisualizer imports the logger module too).
- Consider registering the atexit handler in a more explicit “interactive/CLI setup” path, or at least ensuring it doesn’t pull in heavy deps unnecessarily.
[openhands-sdk/openhands/sdk/conversation/visualizer/default.py, Line 23] Pulling in the logger module just to flush stdin
- openhands.sdk.logger imports the full logger stack (and litellm in logger.py). That’s a big hammer to swing from a visualizer module that otherwise just prints stuff.
- Better separation: put flush_stdin() in a lightweight terminal/tty utility module with minimal imports, and have both logger + visualizer depend on that.
[tests/sdk/logger/test_flush_stdin.py, overall] Tests don’t actually prove the core behavior
- Current tests mostly prove “returns 0 in a couple of early-exit/error cases” and that something calls the function.
- Missing: a test that flush_stdin() actually drains bytes and returns the correct count.
  - On Linux/macOS you can do this with a pty pair: make the slave the “stdin” (isatty=True), write bytes to the master, and assert flush_stdin() consumes them.
- Also missing: a test that would catch the shallow-copy bug above (verify the “restore” attrs equal the original tcgetattr result).
[tests/sdk/logger/test_flush_stdin.py, Lines 24–40] “termios unavailable” test is basically dead code on Unix
- The docstring claims “simulate absence by patching the import mechanism”, but the test just skips on Unix.
- If you want this covered on Unix too, patch builtins.__import__ (or similar) to raise ImportError for termios during the call.

[BREAKING CHANGE / BEHAVIOR RISK] (what might you break?)

[local_conversation.py, Line 628 + visualizer/default.py, Line 256] New flushing points can change interactive behavior
- Even if the intent is correct, the “flush frequently” strategy risks deleting legitimate input, which is effectively a behavior-breaking change for interactive clients.

[SECURITY / CORRECTNESS] (only the real stuff)

No obvious “security exploit” here, but dropping user input is a correctness problem: it can cause mis-confirmations / failed prompts / confusing behavior in CLI tools.

VERDICT

Needs rework before I’d be comfortable calling this “done.” The overall approach is fine, but the termios restore bug is a hard correctness failure, and the flush-call placement (especially in the visualizer) is too aggressive for interactive UX.

Key insight: the hard part isn’t “drain stdin” — it’s drain only when it’s safe, and don’t accidentally leave the terminal in a modified state.

enyst · 2026-02-28T19:33:13Z

Not sure if you have a process for watching higher risk PRs or doing light weight evals on them.

We typically figure it out at review time, and yes, we used to run some ~50 instances evals on them. This is a good point... the reviewer agent doesn't know.

@OpenHands Read this #2245 (comment) and look up the workflow running the pr-review agent. Understand what prompt file the reviewer uses. Investigate the agent loop in the SDK, and add to the reviewer prompt to not approve PRs that change behavior in a way that risks evaluation performance, leave those to human maintainers.

jpshackelford · 2026-02-28T19:33:15Z

@enyst You must be using a different model for your roasted review : )

openhands-ai · 2026-02-28T19:33:38Z

I'm on it! enyst can track my progress at all-hands.dev

enyst · 2026-02-28T19:34:46Z

@enyst You must be using a different model for your roasted review : )

Yes! That's why I do it! GPT-5.2, as safety net for over-happy Claudes. Granted, lately it probably exaggerates in the other direction, but that's why we haz a use for humans. 😅

jpshackelford · 2026-02-28T19:34:50Z

@enyst Give me a signal when you are done looking at it and I'll address the roasted issues and any others.

openhands-ai · 2026-02-28T19:36:57Z

OpenHands encountered an error: **Failed to send message to agent server: HTTP 503 error: no available server
**.

See the conversationfor more information.

enyst · 2026-02-28T19:42:17Z

@enyst Give me a signal when you are done looking at it and I'll address the roasted issues and any others.

All good, the agent took the proposed reviewer agent changes to another PR.

Re: the roasted review. If you want, we could also reuse that conversation to ask it for a re-assessment of its own roasted review, sometimes it figures out where it exaggerated (or not).

Actually let me try it

enyst · 2026-02-28T19:48:51Z

(OpenHands-GPT-5.2; note that in general, it's so prompt-sensitive that once I used words like "re-assessment" it will strongly push it to give a different answer/framing; still the core stuff should(tm) be apparent)

Re-evaluation of my review (section by section)

Taste rating

After thinking harder: still “needs improvement,” but closer to “yellow/acceptable with a couple fixes” than “red dumpster fire.” The core idea is fine; the main question is whether the implementation is safe and provably correct.

[CRITICAL ISSUES] — are they really critical?

1) Termios restore bug due to shallow copy

Claim: [logger.py L241–259] new = list(old) shares old[6], so mutating new[6][VMIN/VTIME] mutates old too, so the restore doesn’t restore.

Is it really the case? Yes. termios.tcgetattr() returns a list where cc is a nested list/array at index 6. A shallow copy (list(old) / old.copy()) will still share that nested cc. So new[6][...] = ... mutates old[6].

Is it as major as I made it sound? Slightly nuanced:

In the common case (terminal in canonical mode, ICANON on), VMIN/VTIME aren’t used, so leaving them at 0 typically won’t break basic input() behavior.
But in non-canonical situations (prompt_toolkit, curses-ish apps, anything that already turned off ICANON), you’ve now silently altered the terminal’s read behavior by “restoring” mutated settings. That’s the kind of thing that creates “random” terminal weirdness.

Conclusion: Still a real correctness bug and still worth treating as must-fix, mostly because it’s trivial to fix and avoids terminal-state corruption in edge-but-real environments.

2) “flush before render can drop real user input”

Claim: [visualizer default.py L252–257] flushing stdin before rendering can discard typeahead / real keystrokes.

Is it really the case? Yes. The function drains whatever bytes are pending on stdin. If the user types while output is happening (typeahead), those bytes are indistinguishable from terminal query responses at this level and will be eaten.

Is it as major as I made it sound? This one depends on product expectations:

If the SDK’s intended UX is “agent runs, user waits, then user types when prompted,” then losing typeahead is less likely, and some projects accept that tradeoff.
If users do type ahead (or you embed this in an app that does concurrent reading, or you’re in a rich TUI-ish environment), silently discarding input is a real UX regression.

Also: calling flush_stdin() before every visualizer render increases the odds of catching typeahead compared to calling it at a single “safe boundary” (like immediately before prompting).

Conclusion: The risk is real; I’d downgrade it from “definitely critical everywhere” to high-risk behavior change that needs justification or a narrower call site. The “right” fix depends on intended interaction model.

[IMPROVEMENT OPPORTUNITIES] — did I overstate them?

3) Import-time `atexit` side effect

Is it really bad? It’s a side effect, but lots of libraries do import-time setup. Here, the SDK already does import-time things (e.g., logging auto-config). So this is not catastrophic.

Severity after reflection: Minor-to-moderate. Worth noting, but not a blocker by itself.

4) Visualizer importing `openhands.sdk.logger` pulls in `litellm` etc.

Is it really the case? Yes: openhands.sdk.logger.__init__ imports from .logger, which imports litellm.

Is it major? Probably minor in this repo because litellm is already a core dependency for the SDK runtime. The cleanliness argument (“flush_stdin should live in a lightweight tty util”) still stands, but it’s not a correctness issue.

[TESTING GAPS] — did I demand too much?

5) Tests don’t prove actual draining / don’t catch the shallow-copy restore bug

Is it really the case? Yes:

The tests validate early returns and that calls happen.
They don’t validate “bytes were present and got drained,” nor do they validate that termios settings are restored correctly (which would catch the shallow-copy issue).

Is this blocking? I’d say not strictly blocking if maintainers are OK relying on manual reproduction + the small surface area of the function. But for a terminal/tty fix (notoriously environment-sensitive), a single robust PTY-based test would materially increase confidence.

Severity after reflection: Moderate (nice to have, but I wouldn’t hold up merging solely on this if the core code is fixed).

Bottom line (after deeper thought)

The termios shallow-copy/restore issue is still the one I’m most confident is objectively wrong and worth fixing.
The “flush on every render” concern is real, but its severity depends on whether the SDK promises to preserve typeahead / interactive keystrokes during agent output. I still think it’s the highest “behavior change” risk in the PR.
The other points (atexit registration, heavy import) are real but secondary.
The tests are thin for the main behavior, but that’s more about confidence than correctness.

If you want, I can also sanity-check where and when user input is actually read in the SDK flow to better judge whether “typeahead loss” is a real practical regression for typical clients, or mostly theoretical.

enyst · 2026-02-28T19:51:17Z

(HUMAN) Last note:

If users do type ahead (or you embed this in an app that does concurrent reading, or you’re in a rich TUI-ish environment), silently discarding input is a real UX regression.

We support user messages sent at any time, including during agent's run, for sure.

All yours!

jpshackelford · 2026-02-28T20:06:52Z

Manual Testing Results on macOS

Testing Approach

Tested this PR branch on macOS using the reproduction scripts from the linked gist.

Installation command:

uv run --with "openhands-sdk @ git+https://github.com/OpenHands/software-agent-sdk.git@fix/stdin-escape-code-leak#subdirectory=openhands-sdk" \
       --with openhands-tools python repro_real.py

Results

Test Criteria	Result
Garbage in shell prompt after exit	✅ None observed
`!!! LEAK DETECTED` message	✅ Not triggered
Escape codes visible inline during execution	⚠️ Still present

Observed inline escape codes:

^[]11;rgb:30fb/3708/41af^G^[[57;1R

These are:

OSC 11 responses (terminal background color)
DSR cursor position responses

Assessment

The fix successfully addresses the main bug:

✅ No stdin corruption affecting subsequent input() calls
✅ No garbage characters leaking to shell prompt after script exits
✅ flush_stdin() is consuming escape code responses before they can leak

Minor cosmetic issue remaining: Escape codes still appear inline in the terminal output between the Observation and Agent Action rendering. This appears to be a timing issue where terminal responses are echoed to stdout before flush_stdin() consumes them.

Next Steps

This PR can be merged - it fixes the critical stdin leak bug as designed
Consider follow-up enhancement: Add flush_stdin() calls at additional points to prevent inline escape code visibility
The inline escape codes are cosmetic and do not affect functionality

jpshackelford · 2026-02-28T20:08:26Z

Manual Testing Results on macOS

Testing Approach

Tested this PR branch on macOS using the reproduction scripts from the linked gist.

Installation command:

uv run --with "openhands-sdk @ git+https://github.com/OpenHands/software-agent-sdk.git@fix/stdin-escape-code-leak#subdirectory=openhands-sdk" \
       --with openhands-tools python repro_real.py

Results

Test Criteria	Result
Garbage in shell prompt after exit	✅ None observed
`!!! LEAK DETECTED` message	✅ Not triggered
Escape codes visible inline during execution	⚠️ Still present

Observed inline escape codes:

^[]11;rgb:30fb/3708/41af^G^[[57;1R

These are:

OSC 11 responses (terminal background color)
DSR cursor position responses

Assessment

The fix successfully addresses the main stdin leak bug:

✅ No stdin corruption affecting subsequent input() calls
✅ No garbage characters leaking to shell prompt after script exits

However, escape codes still appear inline in the terminal output between the Observation and Agent Action rendering.

Suggested Next Steps

The inline escape codes appear to be a timing issue - terminal responses are being echoed to stdout before flush_stdin() can consume them. To address this:

Identify the source of terminal queries - The OSC 11 (background color) and DSR queries are likely coming from Rich's terminal capability detection. Adding logging or breakpoints around flush_stdin() calls could help pinpoint when queries are sent vs when responses arrive.
Consider adding flush_stdin() before Rich renders - Currently the PR calls flush_stdin() after agent steps and before rendering in the visualizer. It may help to also call it immediately before Rich prints output, to catch any responses that arrived during processing.
Alternative: suppress terminal queries - Rich can be configured to skip terminal capability detection. Setting TERM=dumb or using Rich's force_terminal=False might prevent the queries entirely, though this could affect output formatting.

The previous implementation used list(old) which creates a shallow copy. Since old[6] (the cc array) is itself a list, both old[6] and new[6] pointed to the same object. Modifying new[6][VMIN] and new[6][VTIME] also corrupted old[6], making the restore at the end ineffective. This fix uses a comprehension that copies nested lists, ensuring the original termios settings are preserved for proper restoration. Co-authored-by: openhands <openhands@all-hands.dev>

jpshackelford · 2026-02-28T20:23:45Z

Fix for termios shallow-copy bug

Based on the detailed analysis, I've pushed a fix for the termios shallow-copy issue identified in the review.

The Problem

The original code used list(old) which creates a shallow copy:

old = termios.tcgetattr(_sys.stdin)
new = list(old)  # ← Shallow copy
new[6][termios.VMIN] = 0  # ← Also modifies old[6]!

Since old[6] (the cc array) is itself a list, both old[6] and new[6] pointed to the same object. Modifying new[6][VMIN] also corrupted old[6], making the restore at the end ineffective.

Verified with a quick test:

>>> old[6] is new[6]
True  # Same object!
>>> new[6][termios.VMIN] = 99
>>> old[6][termios.VMIN]
99  # Oops, old is corrupted too

The Fix (commit `ae64c17`)

Changed to a proper deep copy using a comprehension that copies nested lists:

new = [item[:] if isinstance(item, list) else item for item in old]

Now old[6] and new[6] are separate objects, and the termios restore works correctly.

This addresses the "termios shallow-copy/restore" concern from the review. The typeahead loss question is still an open design discussion that needs product input.

Add two new tests using pseudo-terminals to verify real terminal behavior: 1. test_flush_stdin_restores_termios_settings: Verifies that VMIN/VTIME settings in the cc array are properly restored after flush_stdin. This test would have caught the shallow-copy bug. 2. test_flush_stdin_drains_pending_data: Verifies that flush_stdin actually reads and discards pending escape code data from stdin. These tests are skipped on Windows where the pty module is unavailable. Co-authored-by: openhands <openhands@all-hands.dev>

jpshackelford · 2026-02-28T20:26:30Z

Added PTY-based tests (commit `a8b4af9`)

Added two new tests using pseudo-terminals to verify real terminal behavior:

1. `test_flush_stdin_restores_termios_settings`

Verifies that VMIN/VTIME settings in the cc array are properly restored after flush_stdin(). This test would have caught the shallow-copy bug - if we revert the deep-copy fix, this test fails.

2. `test_flush_stdin_drains_pending_data`

Verifies that flush_stdin() actually reads and discards pending escape code data from stdin, returning the correct byte count.

Both tests:

Use pty.openpty() to create a real terminal environment
Are skipped on Windows where the pty module is unavailable
Test actual behavior, not mocks

Remaining open item: The typeahead loss concern still needs a product decision.

jpshackelford · 2026-02-28T20:33:12Z

Typeahead Concern - Problem & Proposed Direction

Problem: flush_stdin() drains ALL pending stdin data, which would discard legitimate user input typed during agent execution.

Constraints:

❌ Cannot accept discarding user typeahead - SDK supports messages sent anytime during agent run
❌ Cannot limit fix to only atexit - escape codes leak during execution, not just at exit

Rejected approaches:

Flush everything (current impl) - loses user input
Only flush at exit - doesn't fix mid-execution leaks
TIOCSTI re-injection - disabled on modern Linux kernels for security

Proposed direction: Selective flushing

Parse pending stdin data
Discard only recognized escape sequences (CSI \x1b[..., OSC \x1b]...)
Buffer any other data (likely user input)
Export get_buffered_input() for SDK to prepend to next input read

Will prototype this approach after addressing CI failures.

- Fix line too long in test assertion message - Fix 'possibly unbound' variable warnings by initializing slave_file - Fix pyright type error by extracting lflag with type assertion Co-authored-by: openhands <openhands@all-hands.dev>

The previous implementation drained ALL pending stdin data, which could discard legitimate user typeahead. This update implements selective flushing: - Parse stdin data byte-by-byte, identifying escape sequences - Discard only recognized CSI (\x1b[...) and OSC (\x1b]...) sequences - Preserve all other data in a module-level buffer - Add get_buffered_input() for SDK to retrieve preserved input - Add clear_buffered_input() to explicitly clear the buffer New helper functions: - _is_csi_final_byte(): Check if byte is CSI sequence terminator - _find_csi_end(): Find end of CSI sequence (or detect incomplete) - _find_osc_end(): Find end of OSC sequence with BEL or ST terminator - _parse_stdin_data(): Separate escape sequences from user input Tests added for all parsing functions and the new buffer API. Addresses typeahead concern from PR review: #2245 (comment) Co-authored-by: openhands <openhands@all-hands.dev>

jpshackelford · 2026-02-28T20:54:22Z

Selective Flushing Implementation (commit `5c34edb`)

Based on the typeahead concern discussion, I've implemented selective flushing that preserves user input while discarding terminal escape sequences.

How It Works

The new implementation parses stdin data byte-by-byte:

CSI sequences (\x1b[...X) - Identified by looking for the final byte (0x40-0x7E per ECMA-48). Complete sequences are discarded; incomplete ones are preserved.
OSC sequences (\x1b]...\x07 or \x1b]...\x1b\\) - Terminated by BEL or ST. Complete sequences are discarded; incomplete ones are preserved.
Other data - All other bytes (including partial sequences) are preserved in a module-level buffer.

New Public API

from openhands.sdk.logger import get_buffered_input, clear_buffered_input

# Retrieve preserved user input (clears buffer)
buffered = get_buffered_input()

# Explicitly clear the buffer without retrieving
clear_buffered_input()

SDK Integration Point

Where the SDK reads user input (e.g., before input() calls), it should prepend any buffered data:

buffered = get_buffered_input()
user_input = buffered.decode('utf-8', errors='replace') + input()

Test Coverage

Added comprehensive tests for:

CSI final byte detection
CSI/OSC sequence end finding (complete and incomplete)
Mixed content parsing (escape sequences + user input)
Buffer retrieval and clearing
PTY-based integration tests verifying selective behavior

Design Notes

Incomplete sequences: If an escape sequence is incomplete at the end of a buffer read, it's preserved rather than discarded. This handles the edge case where a user presses Escape and we catch the data before they finish typing.
Arrow keys: Complete CSI sequences like arrow keys (\x1b[A) are flushed. This is acceptable because arrow keys typed during agent execution aren't meaningful input. If this becomes an issue, we could add a whitelist.
Buffer size: No explicit limit on preserved input buffer. In practice, user typeahead during agent execution should be minimal.

jpshackelford · 2026-02-28T21:23:03Z

Next steps, another round of manual testing.

all-hands-bot · 2026-03-04T12:22:57Z

[Automatic Post]: This PR seems to be currently waiting for review. @enyst @juanmichelini, could you please take a look when you have a chance?

CLI tools like `gh` send terminal query sequences (DSR, OSC 11, etc.) as part of their progress/spinner UI. When running inside a PTY, these queries get captured as output. When displayed, the terminal processes them and responds, causing visible escape code garbage. Root cause analysis: - Diagnostic script confirmed queries are IN the captured PTY output, not generated by terminal responses - The `gh` command writes cursor position and background color queries Solution: - Add filter_terminal_queries() to strip query sequences that trigger terminal responses while preserving legitimate formatting codes - Apply filter in _get_command_output() before returning to visualizer - Queries filtered: DSR, OSC 10/11/4, DA, DA2, DECRQSS - Preserved: ANSI colors, cursor movement, bold/formatting Fixes: #2244 Co-authored-by: openhands <openhands@all-hands.dev>

These scripts helped identify the root cause of #2244: - diagnose_source.py: Proves escape codes are IN captured PTY output - diagnose_leak.py: Tests stdin flushing (first attempted fix) - diagnose_echo.py: Tests echo suppression (second attempted fix) - test_real_world.py: End-to-end test with agent - manual-testing.md: Testing notes and findings Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-03-04T13:35:48Z

📁 PR Artifacts Notice

This PR contains a .pr/ directory with PR-specific documents. This directory will be automatically removed when the PR is approved.

For fork PRs: Manual removal is required before merging.

jpshackelford · 2026-03-04T13:41:52Z

Manual Testing Report

Test Environment

macOS 15.6.1
Local SDK development environment with uv run
Terminal: iTerm2

Root Cause Discovery

The diagnostic script .pr/diagnose_source.py was key to identifying the root cause. Running it captured the raw PTY output:

Raw captured output (repr):
'\x1b]11;?\x1b\\\x1b[6n\x1b]11;?\x1b\\\x1b[6n\nShowing 3 of 183 open pull requests...'

Finding: The gh command writes terminal query sequences directly to its stdout as part of its spinner UI:

\x1b[6n - DSR (cursor position query)
\x1b]11;? - OSC 11 (background color query)

These get captured by the PTY and when displayed, the terminal responds to them, causing visible garbage.

Fix Verification

After implementing the filter, ran uv run python .pr/test_real_world.py:

✅ No visible escape codes in output during conversation
✅ Colors preserved - gh output still has colored PR IDs, branches
✅ Clean shell prompt after script exits - no leftover garbage

Previous Attempts (documented in .pr/)

flush_stdin() - Flushed pending input but didn't help (queries are in stdout, not stdin responses)
Echo suppression - Disabled terminal echo but didn't help (queries already captured before display)
Filter queries ✅ - Strips query sequences from captured output before display

The diagnostic scripts in .pr/ document this investigation for future reference.

jpshackelford · 2026-03-04T13:43:28Z

Next steps. Determine if we can reduce the scope of the code committed PR and still resolve the issue. We may end up closing this PR and opening an alternative if we can simplify. Stay tuned.

jpshackelford · 2026-03-06T01:11:00Z

Next step test and refine #2334 as a more targeted solution. If that PR fixes the problem, we will close this one without merging.

all-hands-bot reviewed Feb 28, 2026

View reviewed changes

jpshackelford force-pushed the fix/stdin-escape-code-leak branch from a05c3c8 to 8730c48 Compare February 28, 2026 17:43

chore: trigger CI re-run for updated API breakage comment

ef4f328

Co-authored-by: openhands <openhands@all-hands.dev>

jpshackelford marked this pull request as draft February 28, 2026 19:14

jpshackelford marked this pull request as ready for review February 28, 2026 19:14

all-hands-bot approved these changes Feb 28, 2026

View reviewed changes

jpshackelford requested review from enyst and juanmichelini February 28, 2026 19:17

jpshackelford mentioned this pull request Feb 28, 2026

Terminal escape code responses leak to stdin, corrupting subsequent input #2244

Open

jpshackelford and others added 2 commits February 28, 2026 15:25

Merge branch 'main' into fix/stdin-escape-code-leak

a757db9

openhands-agent added 2 commits February 28, 2026 20:34

fix: address CI lint and type errors

18c0468

- Fix line too long in test assertion message - Fix 'possibly unbound' variable warnings by initializing slave_file - Fix pyright type error by extracting lflag with type assertion Co-authored-by: openhands <openhands@all-hands.dev>

jpshackelford and others added 2 commits March 4, 2026 07:34

jpshackelford changed the title ~~Fix terminal escape code leak from stdin~~ fix(terminal): filter terminal query sequences from captured output Mar 4, 2026

jpshackelford mentioned this pull request Mar 6, 2026

fix(terminal): filter terminal query sequences from captured output #2334

Open

juanmichelini requested a review from malhotra5 March 9, 2026 23:58

Conversation

jpshackelford commented Feb 28, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Root Cause Analysis

Solution

Manual Testing

Files Changed

Uh oh!

github-actions bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API breakage checks (Griffe)

Uh oh!

github-actions bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent server REST API breakage checks (OpenAPI)

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpshackelford commented Feb 28, 2026

Note on API Breakage Check Comment

What's happening:

Why it's not blocking:

Resolution:

Uh oh!

jpshackelford commented Feb 28, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

enyst commented Feb 28, 2026

Uh oh!

openhands-ai bot commented Feb 28, 2026

Uh oh!

openhands-ai bot commented Feb 28, 2026 • edited by enyst Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[CRITICAL ISSUES] (must fix)

[IMPROVEMENT OPPORTUNITIES] (should fix)

[BREAKING CHANGE / BEHAVIOR RISK] (what might you break?)

[SECURITY / CORRECTNESS] (only the real stuff)

VERDICT

Uh oh!

enyst commented Feb 28, 2026

Uh oh!

jpshackelford commented Feb 28, 2026

Uh oh!

openhands-ai bot commented Feb 28, 2026

Uh oh!

enyst commented Feb 28, 2026

Uh oh!

jpshackelford commented Feb 28, 2026

Uh oh!

openhands-ai bot commented Feb 28, 2026

Uh oh!

enyst commented Feb 28, 2026

Uh oh!

enyst commented Feb 28, 2026

Re-evaluation of my review (section by section)

Taste rating

[CRITICAL ISSUES] — are they really critical?

1) Termios restore bug due to shallow copy

2) “flush before render can drop real user input”

[IMPROVEMENT OPPORTUNITIES] — did I overstate them?

3) Import-time atexit side effect

4) Visualizer importing openhands.sdk.logger pulls in litellm etc.

[TESTING GAPS] — did I demand too much?

5) Tests don’t prove actual draining / don’t catch the shallow-copy restore bug

Bottom line (after deeper thought)

Uh oh!

enyst commented Feb 28, 2026

jpshackelford commented Feb 28, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Feb 28, 2026 •

edited

Loading

github-actions bot commented Feb 28, 2026 •

edited

Loading

github-actions bot commented Feb 28, 2026 •

edited

Loading

openhands-ai bot commented Feb 28, 2026 •

edited by enyst

Loading

3) Import-time `atexit` side effect

4) Visualizer importing `openhands.sdk.logger` pulls in `litellm` etc.

The Fix (commit `ae64c17`)

Added PTY-based tests (commit `a8b4af9`)

1. `test_flush_stdin_restores_termios_settings`

2. `test_flush_stdin_drains_pending_data`

Selective Flushing Implementation (commit `5c34edb`)