Skip to content

tui: replay historical events on resume/switch#562

Open
ironbelly wants to merge 12 commits intoOpenHands:mainfrom
IronbellyOrg:main
Open

tui: replay historical events on resume/switch#562
ironbelly wants to merge 12 commits intoOpenHands:mainfrom
IronbellyOrg:main

Conversation

@ironbelly
Copy link

@ironbelly ironbelly commented Mar 2, 2026

tui: replay historical events on resume/switch (#v0.01)
When resuming a conversation via --resume or switching via /history,
historical events are now replayed through the ConversationVisualizer so
the user sees the full conversation history in the TUI.

Changes:

  • Add ConversationVisualizer.replay_events() for side-effect-free replay
    (skips critic handling, telemetry, plan panel refreshes; renders user
    messages inline)
  • Add ConversationRunner.replay_historical_events() with idempotence guard
  • Trigger replay in RunnerRegistry.get_or_create() for newly created runners
  • Add 7 unit tests covering replay order, empty/idempotent cases, and
    registry integration

PR #403 Comparison and Approach debate: : https://github.com/IronbellyOrg/Ironhands-CLI_dev/blob/main/Releases/Complete/v0.01-MessageHistory/adversarial-analysis.md

Verified: make lint, make test (1268 passed), make test-snapshots (59 passed)

fix: resolve 5 message-history bugs for v0.02 patch release (#v0.02)
BUG-001 (High): --resume shows blank screen instead of conversation history

  • Add ensure_runner() to ConversationManager that eagerly calls
    RunnerRegistry.get_or_create(), triggering replay_historical_events()
  • Store _is_resume flag in OpenHandsApp.init
  • Call ensure_runner() in _initialize_main_ui() after set_loaded_resources()
    and before _process_queued_inputs() (ordering constraint)

BUG-002 (Medium): Rapid clicking in /history panel crashes the app

  • Wrap self.screen.get_selected_text() in on_mouse_up with try/except
  • KeyError from Toast widgets in transient lifecycle states is now caught
  • References upstream Textual issue #5646

BUG-003 (Medium): --resume last throws ValueError instead of resuming

  • Add 3-line normalization at top of handle_resume_logic(): detect
    args.resume.lower() == last, set args.last = True, args.resume =
  • Case-insensitive: last, LAST, Last all work

BUG-004 (Low): UnicodeDecodeError in containerized terminals

  • Documented as upstream Textual issue in .dev/KNOWN_ISSUES.md
  • No code change; includes reproduction steps and workaround

BUG-005 (Low): RequestsDependencyWarning on stderr at launch

  • Add targeted warnings.filterwarnings() before third-party imports
  • Add E402/I001 per-file-ignore for entrypoint.py in pyproject.toml

Test coverage:

  • tests/test_entrypoint.py: 5 new tests (--resume last normalization)
  • tests/tui/test_auto_copy.py: 1 new test (exception-path for BUG-002)
  • tests/tui/core/test_conversation_manager.py: 2 new tests (ensure_runner)

Files changed: 3 source, 1 config, 1 doc, 3 test
Lines: +86 -3
All 1276 tests pass. ruff/pyright clean.

Development history: https://github.com/IronbellyOrg/Ironhands-CLI_dev/commit/e84a9c362d480adc629602e7607f3e17a9c63ba5


Co-authored-by: Ironbelly Ryan ryan@ironbelly.com
Co-authored-by: openhands openhands@all-hands.dev

openhands-agent and others added 4 commits February 28, 2026 19:02
When resuming a conversation via --resume <id> or switching via /history,
historical events are now replayed through the ConversationVisualizer so
the user sees the full conversation history in the TUI.

Changes:
- Add ConversationVisualizer.replay_events() for side-effect-free replay
  (skips critic handling, telemetry, plan panel refreshes; renders user
  messages inline)
- Add ConversationRunner.replay_historical_events() with idempotence guard
- Trigger replay in RunnerRegistry.get_or_create() for newly created runners
- Add 7 unit tests covering replay order, empty/idempotent cases, and
  registry integration

Verified: make lint, make test (1268 passed), make test-snapshots (59 passed)

Co-authored-by: Ironbelly Ryan <ryan@ironbelly.com>
Co-authored-by: openhands <openhands@all-hands.dev>
  BUG-001 (High): --resume shows blank screen instead of conversation history
  - Add ensure_runner() to ConversationManager that eagerly calls
    RunnerRegistry.get_or_create(), triggering replay_historical_events()
  - Store _is_resume flag in OpenHandsApp.__init__
  - Call ensure_runner() in _initialize_main_ui() after set_loaded_resources()
    and before _process_queued_inputs() (ordering constraint)

  BUG-002 (Medium): Rapid clicking in /history panel crashes the app
  - Wrap self.screen.get_selected_text() in on_mouse_up with try/except
  - KeyError from Toast widgets in transient lifecycle states is now caught
  - References upstream Textual issue #5646

  BUG-003 (Medium): --resume last throws ValueError instead of resuming
  - Add 3-line normalization at top of handle_resume_logic(): detect
    args.resume.lower() == last, set args.last = True, args.resume =
  - Case-insensitive: last, LAST, Last all work

  BUG-004 (Low): UnicodeDecodeError in containerized terminals
  - Documented as upstream Textual issue in .dev/KNOWN_ISSUES.md
  - No code change; includes reproduction steps and workaround

  BUG-005 (Low): RequestsDependencyWarning on stderr at launch
  - Add targeted warnings.filterwarnings() before third-party imports
  - Add E402/I001 per-file-ignore for entrypoint.py in pyproject.toml

  Test coverage:
  - tests/test_entrypoint.py: 5 new tests (--resume last normalization)
  - tests/tui/test_auto_copy.py: 1 new test (exception-path for BUG-002)
  - tests/tui/core/test_conversation_manager.py: 2 new tests (ensure_runner)

  Files changed: 3 source, 1 config, 1 doc, 3 test
  Lines: +86 -3
  All 1276 tests pass. ruff/pyright clean.
@malhotra5
Copy link
Collaborator

@OpenHands /codereview-roasted

please consider separation of concerns, which components should be orchestrating the replay at all. check whether we are following existing patterns. post back the review as a github comment

@openhands-ai
Copy link

openhands-ai bot commented Mar 2, 2026

I'm on it! malhotra5 can track my progress at all-hands.dev

Copy link
Collaborator

malhotra5 commented Mar 2, 2026

🔴 Taste Rating: Needs Improvement

KEY INSIGHT: Replay responsibility is scattered across 3 layers (App, Registry, Controller) when it should be concentrated in one controller following the existing message-based pattern.


[CRITICAL ISSUES]

[openhands_cli/tui/core/runner_registry.py, Lines 50-53] 🔧 Separation of Concerns:

# Replay historical events for newly created runners (resume / switch).
runner.replay_historical_events()

`RunnerRegistry` is a dumb cache — its job is to store and retrieve runners. Adding side-effect logic (replay) to `get_or_create()` violates single responsibility. The existing pattern has controllers orchestrate behavior, registries just store things.

Suggestion: Remove the replay call from `RunnerRegistry`. Have the caller (controller) trigger replay explicitly.


[openhands_cli/tui/core/conversation_manager.py, Lines 297-306] 🔧 Pattern Violation:

def ensure_runner(self, conversation_id: uuid.UUID) -> None:
    """Eagerly create a runner for a conversation, triggering event replay."""
    self._runners.get_or_create(conversation_id)

The module docstring says ConversationManager is "intentionally small: it listens to... messages bubbling up... and delegates... to focused controllers." But `ensure_runner()` is an imperative escape hatch that bypasses the message architecture.

`OpenHandsApp` now calls `conversation_manager.ensure_runner()` directly instead of posting a message. This creates tight coupling and makes the flow invisible to the message-based debugging tools.

Suggestion: Either:

  1. Create a `ResumeConversation` message that ConversationManager handles via a controller, OR
  2. Make `ConversationSwitchController` the sole owner of replay orchestration (it already handles the switch case)

[openhands_cli/tui/textual_app.py, Lines 460-470] 🔧 Layer Violation:

# BUG-001: For --resume sessions, eagerly create the runner so that
# replay_historical_events() fires before the user sends their first message.
# ORDERING CONSTRAINT: must be called after set_loaded_resources()...
if self._is_resume and self.conversation_state.conversation_id is not None:
    self.conversation_manager.ensure_runner(...)

The App layer now has knowledge of:

  1. Runner creation timing (`_is_resume` flag)
  2. Internal ordering constraints (after `set_loaded_resources()`, before `_process_queued_inputs()`)

This is exactly backwards — the lower layers should handle their own initialization. The App should just say "I'm resuming conversation X" and let the core layer figure out the details.


[IMPROVEMENT OPPORTUNITIES]

[openhands_cli/tui/core/conversation_runner.py, Line 278] ⚡ Simplification:

events = list(self.conversation.state.events)

Why copy to a list? If `conversation.state.events` is already iterable, passing it directly to `replay_events()` avoids memory allocation. Only materialize if you need `len()` — but you return `len(events)` after the fact, which could be `sum(1 for _ in ...)` or tracked during iteration.


[Architectural] 🏗️ Replay Ownership Fragmentation:
Replay is currently triggered from:

  1. `RunnerRegistry.get_or_create()` (for all new runners)
  2. `OpenHandsApp._initialize_main_ui()` (for resume via ensure_runner)
  3. Implicitly via `ConversationSwitchController._prepare_switch()` → `runners.get_or_create()`

This is the "shotgun surgery" anti-pattern. If you need to change replay behavior, you touch 3+ files.

Suggestion: Concentrate replay in ONE place:

  • Option A: Create a `ReplayController` that listens for `ConversationActivated` messages
  • Option B: Make `ConversationSwitchController` handle both switch AND resume (they're conceptually the same: "make conversation X active")
  • Option C: Use `ConversationContainer.watch_conversation_id()` to trigger replay when conversation_id changes (reactive pattern matches existing architecture)

[TESTING GAPS]

[tests/tui/core/test_conversation_runner_replay.py] 🧪 Mocks Aren't Tests:
All 4 tests mock the conversation, visualizer, and events. There's no test that exercises the real `replay_events()` code path with actual `MessageEvent`, `ActionEvent`, `ObservationEvent` objects to verify:

  1. User messages render with `"> "` prefix
  2. Observations pair correctly with preceding actions
  3. The scroll-to-end actually works

Suggestion: Add at least one integration test with real event types (can use fixtures from `tests/tui/widgets/` if they exist).


VERDICT:

Needs rework — The core functionality is sound but the architectural decomposition violates the existing message-based patterns. Concentrating replay ownership in a controller (probably `ConversationSwitchController` since it already handles the similar "make conversation active" flow) would make this much cleaner.

The BUG-002 (KeyError catch), BUG-003 (`--resume last` normalization), and BUG-005 (warnings filter) fixes are fine — minimal, targeted changes.

@openhands-ai
Copy link

openhands-ai bot commented Mar 2, 2026

Summary

I completed the code review request for PR #562 ("tui: replay historical events on resume/switch"). No code changes were made - this was a review-only task.

What I Did:

  1. Retrieved and analyzed the PR diff - Examined changes across 16 files including core TUI components (conversation_manager.py, conversation_runner.py, runner_registry.py, textual_app.py, richlog_visualizer.py)

  2. Studied the existing architecture patterns - Reviewed AGENTS.md, controller patterns (ConversationSwitchController, UserMessageController), and the message-based architecture documented in ConversationManager

  3. Posted a "roasted" code review as a GitHub comment focusing on:

    • Separation of concerns violations - Replay logic scattered across 3 layers (App, Registry, Controller)
    • Pattern violations - ensure_runner() bypasses the message-based architecture, RunnerRegistry now has side-effect logic instead of being a "dumb cache"
    • Architectural recommendations - Concentrate replay ownership in ConversationSwitchController or use reactive watch_conversation_id() pattern

Review Posted:

#562 (comment)

Verdict: 🔴 Needs rework - The bug fixes (BUG-002, BUG-003, BUG-005) are fine, but the core replay feature violates the existing message-based patterns and should be refactored to concentrate ownership in one controller.

…doc fixes

  Implements v0.03 tasklist (17 tasks, 3 phases, 5 milestones) derived from
  the v0.02 post-release adversarial review.

  Phase 0 — CI Green Baseline:
  - Fix ruff-format and pycodestyle E402 violations in entrypoint.py by
    collapsing the BUG-005 warnings.filterwarnings() call and adding
    # noqa: E402 to 7 third-party imports that follow it. The interleaved
    placement is intentional — the filter must be active before any
    transitive import of `requests`.
  - Snapshot mismatch (test_phase5_landing_screen) investigated: passes
    locally, CI failure is environment-specific (Python 3.12.12 vs 3.12.3
    rendering delta). No SVG change required on this branch.

  Phase 1, M1 — Registry Replay Invariant:
  - Reorder get_or_create() so the runner is cached in self._runners ONLY
    after replay_historical_events() completes. If replay raises, no
    partially-initialized runner pollutes the registry.
  - Add docstring documenting main-thread constraint and cache invariant.
  - Add 2 new tests: replay failure leaves cache empty + exception
    propagates to caller.
  - Grep verification confirms replay_historical_events call sites match
    expectations (runner_registry.py call, conversation_runner.py def).

  Phase 1, M2 — Visualizer Replay Test Coverage:
  - Create tests/tui/widgets/test_richlog_visualizer_replay.py with 8 tests:
    T-1: single user message → Static widget with user-message CSS class
    T-2: observation event routes through _handle_observation_event
    T-3: multiple pairs render in correct order
    T-4: scroll_end(animate=False) called exactly once after all events
    T-5: empty event list → no widgets, no scroll, no exception
    T-6: critic handling intentionally omitted (regression guard)
    T-7: telemetry intentionally omitted (regression guard)
    T-8: plan panel refresh intentionally omitted (regression guard)

  Phase 1, M3 — Documentation Corrections:
  - Update ConversationManager class docstring to name ensure_runner() and
    reload_visualizer_configuration() as synchronous direct-call paths that
    bypass message-based dispatch for startup ordering reasons.
  - No exclusive-dispatch language ("only", "exclusively", "always") found
    in message-dispatch context across codebase or docs.

  Phase 2, M4 — Integration Validation:
  - Full suite: 1286 passed, 0 failures.
  - replay_events() branch coverage ~80% (key paths covered).
  - Grep constraint holds — no drift from Phase 1 baseline.

  Context — PR feedback addressed vs. pushed back:

  ADDRESSED (from v0.02 adversarial review):
  1. Registry cache-before-replay ordering bug: The reviewer correctly
     identified that caching a runner before replay completes could leave a
     partially-initialized runner in the registry if replay throws. Fixed
     by reordering (T01.01) and adding regression tests (T01.02).
  2. Missing replay test coverage: Zero tests existed for replay_events().
     The reviewer's 8-scenario test matrix (T-1 through T-8) was adopted
     verbatim. These now guard rendering order, scroll behavior, and
     intentional side-effect omissions.
  3. Misleading ConversationManager docstring: The reviewer flagged that
     the docstring implied message-based dispatch was the exclusive
     coordination mechanism, when ensure_runner() and
     reload_visualizer_configuration() are synchronous direct calls. Fixed
     with qualified language.
  4. E402 lint violations: Pre-existing CI failure in entrypoint.py.
     Added noqa: E402 since the interleaving is intentional (BUG-005).

  PUSHED BACK (deferred or declined):
  1. Runtime thread-safety assertion in get_or_create(): The reviewer
     suggested adding `assert threading.current_thread() is
     threading.main_thread()`. We documented the constraint in the
     docstring (T01.03) but deferred the runtime check to a future release.
     Rationale: adding a hard assertion risks breaking test harnesses that
     run from non-main threads, and the docstring + test coverage provides
     sufficient protection for v0.03.
  2. Moving replay_historical_events into the factory: The adversarial
     panel considered moving replay into RunnerFactory.create() to make the
     invariant structural. The 4-expert consensus chose the registry guard
     approach because it preserves recovery flexibility — the caller can
     catch the exception and retry or fall back without the factory being
     opinionated about lifecycle. See Appendix B in tasklist.
  3. Snapshot environment parity: The CI snapshot failure cannot be
     reproduced locally. Rather than blindly regenerating snapshots in CI
     (which masks real regressions), we documented the environment delta
     and recommend a dedicated CI snapshot update workflow.

  Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…giene

Feature/v0.03 architecture hygiene
  Phase 7: Post-Validation Remediation — All 6 tasks
   COMPLETE

  Task: T07.01
  Description: Fix banner text — removed "or scroll
    to top"
  Status: ✅
  Files Modified: richlog_visualizer.py (2
  locations)
  ────────────────────────────────────────
  Task: T07.02
  Description: Add forgotten_event_ids filtering
  test
  Status: ✅
  Files Modified: test_conversation_runner_replay.py

    (+2 tests)
  ────────────────────────────────────────
  Task: T07.03
  Description: Test cross-boundary observation
  widget
    content
  Status: ✅
  Files Modified: test_richlog_visualizer_replay.py
    (+1 test)
  ────────────────────────────────────────
  Task: T07.04
  Description: Add logger.warning on config fallback
  Status: ✅
  Files Modified: cli_settings.py (logger + warning)
  ────────────────────────────────────────
  Task: T07.05
  Description: Add _flush_live_event_buffer drain
    test
  Status: ✅
  Files Modified: test_richlog_visualizer_replay.py
    (+1 test)
  ────────────────────────────────────────
  Task: T07.06
  Description: Fix LOG-1 to include event count and
    window_size
  Status: ✅
  Files Modified: conversation_runner.py (LOG-1
    rewrite)

  Test Results

  - 81 directly-affected tests: All pass
  - Full suite: 1395 passed, 1 pre-existing snapshot
   failure (unrelated)
  - New tests added: 4 (2 in runner replay, 2 in
  visualizer replay)

  Deliverables

  - D-0029: Banner text corrected (T07.01)
  - D-0030: Forgotten-event exclusion locked by test
   (T07.02)
  - D-0031: Cross-boundary widget content locked by
  test (T07.03)
  - D-0032: AC-11 warning compliance (T07.04)
  - D-0033: Buffer drain path locked by test
  (T07.05)
  - D-0034: LOG-1 spec-compliant with event count +
  window_size (T07.06)
@enyst
Copy link
Collaborator

enyst commented Mar 4, 2026

(Totally on a side note, sorry: Rohit, you are using GPT-5.2 ? 👯 )

@ironbelly
Copy link
Author

ironbelly commented Mar 4, 2026

ADDRESSED (from v0.02 adversarial review):

  1. Registry cache-before-replay ordering bug: The reviewer correctly
    identified that caching a runner before replay completes could leave a
    partially-initialized runner in the registry if replay throws. Fixed
    by reordering (T01.01) and adding regression tests (T01.02).
  2. Missing replay test coverage: Zero tests existed for replay_events().
    The reviewer's 8-scenario test matrix (T-1 through T-8) was adopted
    verbatim. These now guard rendering order, scroll behavior, and
    intentional side-effect omissions.
  3. Misleading ConversationManager docstring: The reviewer flagged that
    the docstring implied message-based dispatch was the exclusive
    coordination mechanism, when ensure_runner() and
    reload_visualizer_configuration() are synchronous direct calls. Fixed
    with qualified language.
  4. E402 lint violations: Pre-existing CI failure in entrypoint.py.
    Added noqa: E402 since the interleaving is intentional (BUG-005).

PUSHED BACK (deferred or declined):

  1. Runtime thread-safety assertion in get_or_create(): The reviewer
    suggested adding assert threading.current_thread() is threading.main_thread(). We documented the constraint in the
    docstring (T01.03) but deferred the runtime check to a future release.
    Rationale: adding a hard assertion risks breaking test harnesses that
    run from non-main threads, and the docstring + test coverage provides
    sufficient protection for v0.03.
  2. Moving replay_historical_events into the factory: The adversarial
    panel considered moving replay into RunnerFactory.create() to make the
    invariant structural. The 4-expert consensus chose the registry guard
    approach because it preserves recovery flexibility — the caller can
    catch the exception and retry or fall back without the factory being
    opinionated about lifecycle. See Appendix B in tasklist.
  3. Snapshot environment parity: The CI snapshot failure cannot be
    reproduced locally. Rather than blindly regenerating snapshots in CI
    (which masks real regressions), we documented the environment delta
    and pinned the local enviro to the same version of python

🔴 Taste Rating: Needs Improvement

KEY INSIGHT: Replay responsibility is scattered across 3 layers (App, Registry, Controller) when it should be concentrated in one controller following the existing message-based pattern.

@enyst
Copy link
Collaborator

enyst commented Mar 4, 2026

@OpenHands Do a /codereview-roasted on this PR. Publish your review with the appropriate event on gh.

@openhands-ai
Copy link

openhands-ai bot commented Mar 4, 2026

I'm on it! enyst can track my progress at all-hands.dev

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Taste Rating: Needs Improvement

[CRITICAL ISSUES]

  • [openhands_cli/tui/widgets/richlog_visualizer.py, Lines 963–987] State Tracking: load_older_events() decrements _loaded_start_index by mounted (widget count). If any event doesn't render a widget (filtered MessageEvent, CondensationRequest, empty user message), _loaded_start_index stalls and PageUp keeps reprocessing the same slice forever. Decrement by the number of events consumed (e.g., len(older_events)), and keep mounted only for scroll offset, or track a separate loaded_event_count.
  • [openhands_cli/tui/core/conversation_runner.py, Lines 301 & 353] Idempotence: replay guard only checks _replayed_event_offset > 0 and sets it to len(plan.tail_events). When the tail is empty (e.g., latest event is Condensation), offset stays 0 so replay can run repeatedly, duplicating banners/widgets. Use a boolean replay_done or set offset to total_count to mark replay even when tail is empty.

[TESTING GAPS]

  • Add regression tests for (a) replay with empty tail after condensation being idempotent and (b) load_older_events() advancing _loaded_start_index even when older events produce no widgets.

VERDICT:Needs rework

KEY INSIGHT: You're using widget count as the history index, which breaks pagination and replay idempotence in edge cases.

@openhands-ai
Copy link

openhands-ai bot commented Mar 4, 2026

Final summary: I fetched PR #562’s diff/metadata, performed a /codereview-roasted, and posted a CHANGES_REQUESTED review on GitHub highlighting two critical issues (pagination index stalling when events render no widgets, and replay idempotence breaking with empty tail after condensation) plus requested regression tests. No code changes were made, so no git diff or pushes were required.

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cool PR, thank you for this! Interesting workflow too ❤️

I say let’s let the agents battle it out 🫠

- Removed index and architectural files as they do not reduce token usage as assumed

  Two bugs validated by adversarial debate against PR feedback:

  1. Replay guard bypass (conversation_runner.py):
     - Guard `_replayed_event_offset > 0` fails when tail is empty
       (e.g., condensation is the last event — offset stays 0)
     - Added `_replay_complete` boolean flag as the idempotence guard
     - Preserves _replayed_event_offset contract as event delivery count

  2. PageUp index stall (richlog_visualizer.py):
     - `_loaded_start_index -= mounted` under-decrements when
       _create_replay_widget returns None for filtered events
     - Changed to `consumed` counter tracking all events processed,
       not just those producing widgets
     - Cap-break correctly limits consumed (unlike len(older_events))

  Tests: 3 regression tests added (633/633 TUI suite passing)

  ---
  Changed files (our work only):

  ┌─────────────────────────────────────────────────────┬───────┐
  │                        File                         │ Delta │
  ├─────────────────────────────────────────────────────┼───────┤
  │ openhands_cli/tui/core/conversation_runner.py       │ +9/-2 │
  ├─────────────────────────────────────────────────────┼───────┤
  │ openhands_cli/tui/widgets/richlog_visualizer.py     │ +4/-2 │
  ├─────────────────────────────────────────────────────┼───────┤
  │ tests/tui/core/test_conversation_runner_replay.py   │ +79   │
  ├─────────────────────────────────────────────────────┼───────┤
  │ tests/tui/widgets/test_richlog_visualizer_replay.py │ +88   │
  └─────────────────────────────────────────────────────┴───────┘
 fix(v0.04): replay guard bypass and PageUp index stall
removed indexes
@ironbelly
Copy link
Author

This is a cool PR, thank you for this! Interesting workflow too ❤️

I say let’s let the agents battle it out 🫠

Posted another update with those issues addressed.. Also lead to refactoring some of my skills and agents to better catch this kind of thing in the future, so thanks for that

@ironbelly
Copy link
Author

🔴 Taste Rating: Needs Improvement

[CRITICAL ISSUES]

  • [openhands_cli/tui/widgets/richlog_visualizer.py, Lines 963–987] State Tracking: load_older_events() decrements _loaded_start_index by mounted (widget count). If any event doesn't render a widget (filtered MessageEvent, CondensationRequest, empty user message), _loaded_start_index stalls and PageUp keeps reprocessing the same slice forever. Decrement by the number of events consumed (e.g., len(older_events)), and keep mounted only for scroll offset, or track a separate loaded_event_count.
  • [openhands_cli/tui/core/conversation_runner.py, Lines 301 & 353] Idempotence: replay guard only checks _replayed_event_offset > 0 and sets it to len(plan.tail_events). When the tail is empty (e.g., latest event is Condensation), offset stays 0 so replay can run repeatedly, duplicating banners/widgets. Use a boolean replay_done or set offset to total_count to mark replay even when tail is empty.

[TESTING GAPS]

  • Add regression tests for (a) replay with empty tail after condensation being idempotent and (b) load_older_events() advancing _loaded_start_index even when older events produce no widgets.

VERDICT:Needs rework

KEY INSIGHT: You're using widget count as the history index, which breaks pagination and replay idempotence in edge cases.

This has all been addressed in the recent update

@neubig neubig requested a review from enyst March 5, 2026 23:12
@enyst
Copy link
Collaborator

enyst commented Mar 6, 2026

@OpenHands Read all top level comments here, and understand this PR in depth.

Another AI agent has previously posted a REQUEST_CHANGES review; and the PR author and their agents have fixed the PR since then. Let's see if the fixed PR is fine. Do a /codereview-roasted on it and publish your review. You can use the suitable event in the PR review via gh api.

Think deeply and publish a review (you are not allowed to post a comment, but publish a review, you are allowed to publish a review with the gh api and with an event)

@openhands-ai
Copy link

openhands-ai bot commented Mar 6, 2026

I'm on it! enyst can track my progress at all-hands.dev

enyst
enyst previously requested changes Mar 6, 2026
Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Taste Rating: Needs Improvement

[CRITICAL ISSUES]

  • [openhands_cli/tui/widgets/richlog_visualizer.py, Lines 1040–1064] Condensation banner disappears after full load: _update_banner_for_loaded_state() removes _banner_widget when _loaded_start_index reaches 0, even if _has_condensation is true. After the user pages up to load all post‑condensation events, the summary banner is removed and there is no indication of condensed prior context. This breaks the goal of preserving condensed history. Keep the banner when has_condensation is true, and update its text based on condensed_count even when _loaded_start_index is 0.

[TESTING GAPS]

  • Add a regression test that loads older events until _loaded_start_index == 0 in a condensation scenario and asserts the summary banner remains mounted (or updated) with the condensed‑count text. The current tests only cover non‑condensation load paths.

VERDICT: ❌ Needs rework
KEY INSIGHT: The summary banner is treated as a “load‑more” indicator, but for condensation it’s the only representation of hidden history and must persist.

@openhands-ai

This comment was marked as duplicate.

@ironbelly
Copy link
Author

I see I have some cleaning up to do. Feeling guilty for the time and resources I'm taking to babysit this PR with me. Give me a couple days to review my process and implement the same types of checks on my side as you have on your side

@ironbelly
Copy link
Author

@openhands-agent

Objective:

  • To identify faults and weaknesses in my workflow in order to strengthen my process and prevent similar issues from occurring in the future

Task:

  • In regards to the Critical issue you listed. Read through our conversation history and previous feedback as well as file structure and comments and determine if there we're clear indicators or instructions requiring the Condensation Banner to persist when has_condensation is true, and update its text based on condensed_count even when _loaded_start_index is 0.

Context:

[openhands_cli/tui/widgets/richlog_visualizer.py, Lines 1040–1064] Condensation banner disappears after full load: _update_banner_for_loaded_state() removes _banner_widget when _loaded_start_index reaches 0, even if _has_condensation is true. After the user pages up to load all post‑condensation events, the summary banner is removed and there is no indication of condensed prior context. This breaks the goal of preserving condensed history. Keep the banner when has_condensation is true, and update its text based on condensed_count even when _loaded_start_index is 0.

@enyst enyst dismissed their stale review March 7, 2026 07:08

I actually looked a bit into this with my agent the other day - sorry. I'm not sure that keeping condensation banner is "critical" at all :) In any case, we can discuss things, I just want to dismiss this because my agent "requested changes" and I don't want to block this PR !

@enyst
Copy link
Collaborator

enyst commented Mar 7, 2026

Thank you for this PR, and I'm really sorry if my agent was too obnoxious.

Just for clarity, IMHO it's perfectly possible to limit events rendered at restore in more than one way:

  • maybe up to 20 (or hardcoded value)
  • maybe up to last condensation
  • maybe by paging - as I think this PR did

My only concern here was that we cannot, IMHO, render ALL history events. The amount can be large (unbounded practically, really) and have the bad habit to slow down the machine maybe as soon as 1-2k

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants