tui: replay historical events on resume/switch#562
tui: replay historical events on resume/switch#562ironbelly wants to merge 12 commits intoOpenHands:mainfrom
Conversation
When resuming a conversation via --resume <id> or switching via /history, historical events are now replayed through the ConversationVisualizer so the user sees the full conversation history in the TUI. Changes: - Add ConversationVisualizer.replay_events() for side-effect-free replay (skips critic handling, telemetry, plan panel refreshes; renders user messages inline) - Add ConversationRunner.replay_historical_events() with idempotence guard - Trigger replay in RunnerRegistry.get_or_create() for newly created runners - Add 7 unit tests covering replay order, empty/idempotent cases, and registry integration Verified: make lint, make test (1268 passed), make test-snapshots (59 passed) Co-authored-by: Ironbelly Ryan <ryan@ironbelly.com> Co-authored-by: openhands <openhands@all-hands.dev>
BUG-001 (High): --resume shows blank screen instead of conversation history
- Add ensure_runner() to ConversationManager that eagerly calls
RunnerRegistry.get_or_create(), triggering replay_historical_events()
- Store _is_resume flag in OpenHandsApp.__init__
- Call ensure_runner() in _initialize_main_ui() after set_loaded_resources()
and before _process_queued_inputs() (ordering constraint)
BUG-002 (Medium): Rapid clicking in /history panel crashes the app
- Wrap self.screen.get_selected_text() in on_mouse_up with try/except
- KeyError from Toast widgets in transient lifecycle states is now caught
- References upstream Textual issue #5646
BUG-003 (Medium): --resume last throws ValueError instead of resuming
- Add 3-line normalization at top of handle_resume_logic(): detect
args.resume.lower() == last, set args.last = True, args.resume =
- Case-insensitive: last, LAST, Last all work
BUG-004 (Low): UnicodeDecodeError in containerized terminals
- Documented as upstream Textual issue in .dev/KNOWN_ISSUES.md
- No code change; includes reproduction steps and workaround
BUG-005 (Low): RequestsDependencyWarning on stderr at launch
- Add targeted warnings.filterwarnings() before third-party imports
- Add E402/I001 per-file-ignore for entrypoint.py in pyproject.toml
Test coverage:
- tests/test_entrypoint.py: 5 new tests (--resume last normalization)
- tests/tui/test_auto_copy.py: 1 new test (exception-path for BUG-002)
- tests/tui/core/test_conversation_manager.py: 2 new tests (ensure_runner)
Files changed: 3 source, 1 config, 1 doc, 3 test
Lines: +86 -3
All 1276 tests pass. ruff/pyright clean.
…docs for agent and human context
|
@OpenHands /codereview-roasted please consider separation of concerns, which components should be orchestrating the replay at all. check whether we are following existing patterns. post back the review as a github comment |
|
I'm on it! malhotra5 can track my progress at all-hands.dev |
🔴 Taste Rating: Needs ImprovementKEY INSIGHT: Replay responsibility is scattered across 3 layers (App, Registry, Controller) when it should be concentrated in one controller following the existing message-based pattern. [CRITICAL ISSUES][openhands_cli/tui/core/runner_registry.py, Lines 50-53] 🔧 Separation of Concerns: # Replay historical events for newly created runners (resume / switch).
runner.replay_historical_events()`RunnerRegistry` is a dumb cache — its job is to store and retrieve runners. Adding side-effect logic (replay) to `get_or_create()` violates single responsibility. The existing pattern has controllers orchestrate behavior, registries just store things. Suggestion: Remove the replay call from `RunnerRegistry`. Have the caller (controller) trigger replay explicitly. [openhands_cli/tui/core/conversation_manager.py, Lines 297-306] 🔧 Pattern Violation: def ensure_runner(self, conversation_id: uuid.UUID) -> None:
"""Eagerly create a runner for a conversation, triggering event replay."""
self._runners.get_or_create(conversation_id)The module docstring says ConversationManager is "intentionally small: it listens to... messages bubbling up... and delegates... to focused controllers." But `ensure_runner()` is an imperative escape hatch that bypasses the message architecture. `OpenHandsApp` now calls `conversation_manager.ensure_runner()` directly instead of posting a message. This creates tight coupling and makes the flow invisible to the message-based debugging tools. Suggestion: Either:
[openhands_cli/tui/textual_app.py, Lines 460-470] 🔧 Layer Violation: # BUG-001: For --resume sessions, eagerly create the runner so that
# replay_historical_events() fires before the user sends their first message.
# ORDERING CONSTRAINT: must be called after set_loaded_resources()...
if self._is_resume and self.conversation_state.conversation_id is not None:
self.conversation_manager.ensure_runner(...)The App layer now has knowledge of:
This is exactly backwards — the lower layers should handle their own initialization. The App should just say "I'm resuming conversation X" and let the core layer figure out the details. [IMPROVEMENT OPPORTUNITIES][openhands_cli/tui/core/conversation_runner.py, Line 278] ⚡ Simplification: events = list(self.conversation.state.events)Why copy to a list? If `conversation.state.events` is already iterable, passing it directly to `replay_events()` avoids memory allocation. Only materialize if you need `len()` — but you return `len(events)` after the fact, which could be `sum(1 for _ in ...)` or tracked during iteration. [Architectural] 🏗️ Replay Ownership Fragmentation:
This is the "shotgun surgery" anti-pattern. If you need to change replay behavior, you touch 3+ files. Suggestion: Concentrate replay in ONE place:
[TESTING GAPS][tests/tui/core/test_conversation_runner_replay.py] 🧪 Mocks Aren't Tests:
Suggestion: Add at least one integration test with real event types (can use fixtures from `tests/tui/widgets/` if they exist). VERDICT:❌ Needs rework — The core functionality is sound but the architectural decomposition violates the existing message-based patterns. Concentrating replay ownership in a controller (probably `ConversationSwitchController` since it already handles the similar "make conversation active" flow) would make this much cleaner. The BUG-002 (KeyError catch), BUG-003 (`--resume last` normalization), and BUG-005 (warnings filter) fixes are fine — minimal, targeted changes. |
SummaryI completed the code review request for PR #562 ("tui: replay historical events on resume/switch"). No code changes were made - this was a review-only task. What I Did:
Review Posted:Verdict: 🔴 Needs rework - The bug fixes (BUG-002, BUG-003, BUG-005) are fine, but the core replay feature violates the existing message-based patterns and should be refactored to concentrate ownership in one controller. |
…doc fixes
Implements v0.03 tasklist (17 tasks, 3 phases, 5 milestones) derived from
the v0.02 post-release adversarial review.
Phase 0 — CI Green Baseline:
- Fix ruff-format and pycodestyle E402 violations in entrypoint.py by
collapsing the BUG-005 warnings.filterwarnings() call and adding
# noqa: E402 to 7 third-party imports that follow it. The interleaved
placement is intentional — the filter must be active before any
transitive import of `requests`.
- Snapshot mismatch (test_phase5_landing_screen) investigated: passes
locally, CI failure is environment-specific (Python 3.12.12 vs 3.12.3
rendering delta). No SVG change required on this branch.
Phase 1, M1 — Registry Replay Invariant:
- Reorder get_or_create() so the runner is cached in self._runners ONLY
after replay_historical_events() completes. If replay raises, no
partially-initialized runner pollutes the registry.
- Add docstring documenting main-thread constraint and cache invariant.
- Add 2 new tests: replay failure leaves cache empty + exception
propagates to caller.
- Grep verification confirms replay_historical_events call sites match
expectations (runner_registry.py call, conversation_runner.py def).
Phase 1, M2 — Visualizer Replay Test Coverage:
- Create tests/tui/widgets/test_richlog_visualizer_replay.py with 8 tests:
T-1: single user message → Static widget with user-message CSS class
T-2: observation event routes through _handle_observation_event
T-3: multiple pairs render in correct order
T-4: scroll_end(animate=False) called exactly once after all events
T-5: empty event list → no widgets, no scroll, no exception
T-6: critic handling intentionally omitted (regression guard)
T-7: telemetry intentionally omitted (regression guard)
T-8: plan panel refresh intentionally omitted (regression guard)
Phase 1, M3 — Documentation Corrections:
- Update ConversationManager class docstring to name ensure_runner() and
reload_visualizer_configuration() as synchronous direct-call paths that
bypass message-based dispatch for startup ordering reasons.
- No exclusive-dispatch language ("only", "exclusively", "always") found
in message-dispatch context across codebase or docs.
Phase 2, M4 — Integration Validation:
- Full suite: 1286 passed, 0 failures.
- replay_events() branch coverage ~80% (key paths covered).
- Grep constraint holds — no drift from Phase 1 baseline.
Context — PR feedback addressed vs. pushed back:
ADDRESSED (from v0.02 adversarial review):
1. Registry cache-before-replay ordering bug: The reviewer correctly
identified that caching a runner before replay completes could leave a
partially-initialized runner in the registry if replay throws. Fixed
by reordering (T01.01) and adding regression tests (T01.02).
2. Missing replay test coverage: Zero tests existed for replay_events().
The reviewer's 8-scenario test matrix (T-1 through T-8) was adopted
verbatim. These now guard rendering order, scroll behavior, and
intentional side-effect omissions.
3. Misleading ConversationManager docstring: The reviewer flagged that
the docstring implied message-based dispatch was the exclusive
coordination mechanism, when ensure_runner() and
reload_visualizer_configuration() are synchronous direct calls. Fixed
with qualified language.
4. E402 lint violations: Pre-existing CI failure in entrypoint.py.
Added noqa: E402 since the interleaving is intentional (BUG-005).
PUSHED BACK (deferred or declined):
1. Runtime thread-safety assertion in get_or_create(): The reviewer
suggested adding `assert threading.current_thread() is
threading.main_thread()`. We documented the constraint in the
docstring (T01.03) but deferred the runtime check to a future release.
Rationale: adding a hard assertion risks breaking test harnesses that
run from non-main threads, and the docstring + test coverage provides
sufficient protection for v0.03.
2. Moving replay_historical_events into the factory: The adversarial
panel considered moving replay into RunnerFactory.create() to make the
invariant structural. The 4-expert consensus chose the registry guard
approach because it preserves recovery flexibility — the caller can
catch the exception and retry or fall back without the factory being
opinionated about lifecycle. See Appendix B in tasklist.
3. Snapshot environment parity: The CI snapshot failure cannot be
reproduced locally. Rather than blindly regenerating snapshots in CI
(which masks real regressions), we documented the environment delta
and recommend a dedicated CI snapshot update workflow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…giene Feature/v0.03 architecture hygiene
Phase 7: Post-Validation Remediation — All 6 tasks
COMPLETE
Task: T07.01
Description: Fix banner text — removed "or scroll
to top"
Status: ✅
Files Modified: richlog_visualizer.py (2
locations)
────────────────────────────────────────
Task: T07.02
Description: Add forgotten_event_ids filtering
test
Status: ✅
Files Modified: test_conversation_runner_replay.py
(+2 tests)
────────────────────────────────────────
Task: T07.03
Description: Test cross-boundary observation
widget
content
Status: ✅
Files Modified: test_richlog_visualizer_replay.py
(+1 test)
────────────────────────────────────────
Task: T07.04
Description: Add logger.warning on config fallback
Status: ✅
Files Modified: cli_settings.py (logger + warning)
────────────────────────────────────────
Task: T07.05
Description: Add _flush_live_event_buffer drain
test
Status: ✅
Files Modified: test_richlog_visualizer_replay.py
(+1 test)
────────────────────────────────────────
Task: T07.06
Description: Fix LOG-1 to include event count and
window_size
Status: ✅
Files Modified: conversation_runner.py (LOG-1
rewrite)
Test Results
- 81 directly-affected tests: All pass
- Full suite: 1395 passed, 1 pre-existing snapshot
failure (unrelated)
- New tests added: 4 (2 in runner replay, 2 in
visualizer replay)
Deliverables
- D-0029: Banner text corrected (T07.01)
- D-0030: Forgotten-event exclusion locked by test
(T07.02)
- D-0031: Cross-boundary widget content locked by
test (T07.03)
- D-0032: AC-11 warning compliance (T07.04)
- D-0033: Buffer drain path locked by test
(T07.05)
- D-0034: LOG-1 spec-compliant with event count +
window_size (T07.06)
|
(Totally on a side note, sorry: Rohit, you are using GPT-5.2 ? 👯 ) |
|
ADDRESSED (from v0.02 adversarial review):
PUSHED BACK (deferred or declined):
|
Completion Report
|
@OpenHands Do a /codereview-roasted on this PR. Publish your review with the appropriate event on gh. |
|
I'm on it! enyst can track my progress at all-hands.dev |
enyst
left a comment
There was a problem hiding this comment.
🔴 Taste Rating: Needs Improvement
[CRITICAL ISSUES]
- [openhands_cli/tui/widgets/richlog_visualizer.py, Lines 963–987] State Tracking:
load_older_events()decrements_loaded_start_indexbymounted(widget count). If any event doesn't render a widget (filtered MessageEvent, CondensationRequest, empty user message),_loaded_start_indexstalls and PageUp keeps reprocessing the same slice forever. Decrement by the number of events consumed (e.g.,len(older_events)), and keepmountedonly for scroll offset, or track a separateloaded_event_count. - [openhands_cli/tui/core/conversation_runner.py, Lines 301 & 353] Idempotence: replay guard only checks
_replayed_event_offset > 0and sets it tolen(plan.tail_events). When the tail is empty (e.g., latest event isCondensation), offset stays 0 so replay can run repeatedly, duplicating banners/widgets. Use a booleanreplay_doneor set offset tototal_countto mark replay even when tail is empty.
[TESTING GAPS]
- Add regression tests for (a) replay with empty tail after condensation being idempotent and (b)
load_older_events()advancing_loaded_start_indexeven when older events produce no widgets.
VERDICT: ❌ Needs rework
KEY INSIGHT: You're using widget count as the history index, which breaks pagination and replay idempotence in edge cases.
|
Final summary: I fetched PR #562’s diff/metadata, performed a /codereview-roasted, and posted a CHANGES_REQUESTED review on GitHub highlighting two critical issues (pagination index stalling when events render no widgets, and replay idempotence breaking with empty tail after condensation) plus requested regression tests. No code changes were made, so no git diff or pushes were required. |
enyst
left a comment
There was a problem hiding this comment.
This is a cool PR, thank you for this! Interesting workflow too ❤️
I say let’s let the agents battle it out 🫠
- Removed index and architectural files as they do not reduce token usage as assumed
Two bugs validated by adversarial debate against PR feedback:
1. Replay guard bypass (conversation_runner.py):
- Guard `_replayed_event_offset > 0` fails when tail is empty
(e.g., condensation is the last event — offset stays 0)
- Added `_replay_complete` boolean flag as the idempotence guard
- Preserves _replayed_event_offset contract as event delivery count
2. PageUp index stall (richlog_visualizer.py):
- `_loaded_start_index -= mounted` under-decrements when
_create_replay_widget returns None for filtered events
- Changed to `consumed` counter tracking all events processed,
not just those producing widgets
- Cap-break correctly limits consumed (unlike len(older_events))
Tests: 3 regression tests added (633/633 TUI suite passing)
---
Changed files (our work only):
┌─────────────────────────────────────────────────────┬───────┐
│ File │ Delta │
├─────────────────────────────────────────────────────┼───────┤
│ openhands_cli/tui/core/conversation_runner.py │ +9/-2 │
├─────────────────────────────────────────────────────┼───────┤
│ openhands_cli/tui/widgets/richlog_visualizer.py │ +4/-2 │
├─────────────────────────────────────────────────────┼───────┤
│ tests/tui/core/test_conversation_runner_replay.py │ +79 │
├─────────────────────────────────────────────────────┼───────┤
│ tests/tui/widgets/test_richlog_visualizer_replay.py │ +88 │
└─────────────────────────────────────────────────────┴───────┘
fix(v0.04): replay guard bypass and PageUp index stall removed indexes
Posted another update with those issues addressed.. Also lead to refactoring some of my skills and agents to better catch this kind of thing in the future, so thanks for that |
This has all been addressed in the recent update |
|
@OpenHands Read all top level comments here, and understand this PR in depth. Another AI agent has previously posted a REQUEST_CHANGES review; and the PR author and their agents have fixed the PR since then. Let's see if the fixed PR is fine. Do a /codereview-roasted on it and publish your review. You can use the suitable Think deeply and publish a review (you are not allowed to post a comment, but publish a review, you are allowed to publish a review with the gh api and with an event) |
|
I'm on it! enyst can track my progress at all-hands.dev |
enyst
left a comment
There was a problem hiding this comment.
🔴 Taste Rating: Needs Improvement
[CRITICAL ISSUES]
- [openhands_cli/tui/widgets/richlog_visualizer.py, Lines 1040–1064] Condensation banner disappears after full load:
_update_banner_for_loaded_state()removes_banner_widgetwhen_loaded_start_indexreaches 0, even if_has_condensationis true. After the user pages up to load all post‑condensation events, the summary banner is removed and there is no indication of condensed prior context. This breaks the goal of preserving condensed history. Keep the banner whenhas_condensationis true, and update its text based oncondensed_counteven when_loaded_start_indexis 0.
[TESTING GAPS]
- Add a regression test that loads older events until
_loaded_start_index == 0in a condensation scenario and asserts the summary banner remains mounted (or updated) with the condensed‑count text. The current tests only cover non‑condensation load paths.
VERDICT: ❌ Needs rework
KEY INSIGHT: The summary banner is treated as a “load‑more” indicator, but for condensation it’s the only representation of hidden history and must persist.
This comment was marked as duplicate.
This comment was marked as duplicate.
|
I see I have some cleaning up to do. Feeling guilty for the time and resources I'm taking to babysit this PR with me. Give me a couple days to review my process and implement the same types of checks on my side as you have on your side |
Objective:
Task:
Context:
|
I actually looked a bit into this with my agent the other day - sorry. I'm not sure that keeping condensation banner is "critical" at all :) In any case, we can discuss things, I just want to dismiss this because my agent "requested changes" and I don't want to block this PR !
|
Thank you for this PR, and I'm really sorry if my agent was too obnoxious. Just for clarity, IMHO it's perfectly possible to limit events rendered at restore in more than one way:
My only concern here was that we cannot, IMHO, render ALL history events. The amount can be large (unbounded practically, really) and have the bad habit to slow down the machine maybe as soon as 1-2k |
tui: replay historical events on resume/switch (#v0.01)
When resuming a conversation via --resume or switching via /history,
historical events are now replayed through the ConversationVisualizer so
the user sees the full conversation history in the TUI.
Changes:
(skips critic handling, telemetry, plan panel refreshes; renders user
messages inline)
registry integration
PR #403 Comparison and Approach debate: : https://github.com/IronbellyOrg/Ironhands-CLI_dev/blob/main/Releases/Complete/v0.01-MessageHistory/adversarial-analysis.md
Verified: make lint, make test (1268 passed), make test-snapshots (59 passed)
fix: resolve 5 message-history bugs for v0.02 patch release (#v0.02)
BUG-001 (High): --resume shows blank screen instead of conversation history
RunnerRegistry.get_or_create(), triggering replay_historical_events()
and before _process_queued_inputs() (ordering constraint)
BUG-002 (Medium): Rapid clicking in /history panel crashes the app
BUG-003 (Medium): --resume last throws ValueError instead of resuming
args.resume.lower() == last, set args.last = True, args.resume =
BUG-004 (Low): UnicodeDecodeError in containerized terminals
BUG-005 (Low): RequestsDependencyWarning on stderr at launch
Test coverage:
Files changed: 3 source, 1 config, 1 doc, 3 test
Lines: +86 -3
All 1276 tests pass. ruff/pyright clean.
Development history: https://github.com/IronbellyOrg/Ironhands-CLI_dev/commit/e84a9c362d480adc629602e7607f3e17a9c63ba5
Co-authored-by: Ironbelly Ryan ryan@ironbelly.com
Co-authored-by: openhands openhands@all-hands.dev