[3604] Surface mutating tool evidence status in the TUI during build/create turns#3605
[3604] Surface mutating tool evidence status in the TUI during build/create turns#3605
Conversation
There was a problem hiding this comment.
njfio has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 34b34bbce1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if entry.status != ToolStatus::Success { | ||
| continue; | ||
| } | ||
| has_successful_tool = true; | ||
| if MUTATING_TOOL_NAMES.contains(&entry.name.as_str()) { |
There was a problem hiding this comment.
Ignore non-tool operator events in evidence calculation
tool_evidence_state treats every successful entry in app.tools as tool evidence, but apply_operator_state records turn and artifact states there too. As soon as a streamed response emits an artifact/turn completion (with no actual tool call), this logic sets has_successful_tool = true and the UI reports read-only so far instead of no mutating evidence yet, which is a false safety signal for build/create prompts.
Useful? React with 👍 / 👎.
| for entry in app.tools.entries() { | ||
| if entry.status != ToolStatus::Success { |
There was a problem hiding this comment.
Scope mutating-evidence status to the active turn
This loop scans the entire session history (app.tools.entries()) when deriving build evidence, so evidence from earlier turns leaks into later ones. After any previous successful write/edit, a new build/create turn will immediately show write/edit confirmed before executing any mutating action in that turn, which misrepresents the current turn’s evidence state.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
njfio has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
Pull request overview
This PR aims to improve operator confidence during long build/create turns by surfacing “mutating tool evidence” state in the interactive TUI (e.g., no mutating evidence yet vs read-only vs confirmed mutating), while also introducing broader interactive TUI architecture changes (transcript-first layout, gateway-backed streaming) and related runtime/provider safeguards.
Changes:
- Add build/create evidence-state derivation and render it in both Live activity and the run-state card.
- Introduce a transcript-first interactive TUI shell (status bar + activity strip + run-state card + transcript + composer), plus detail drawer/overlays and command palette tests.
- Add gateway-backed interactive streaming support and related defaults (model selection), plus timeout alignment and runtime safety guards elsewhere.
Reviewed changes
Copilot reviewed 60 out of 61 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| specs/3604-tui-mutating-tool-evidence-status.md | Adds #3604 spec describing evidence-state UI behavior and tests. |
| specs/3603-require-mutating-tool-evidence-for-build-completion.md | Adds runtime safety spec for requiring mutating evidence for completion claims. |
| specs/3602-fail-closed-on-unverified-build-progress.md | Adds runtime safety spec for failing closed on unverified progress. |
| specs/3601-cli-backend-timeout-aligns-with-request-timeout-budget.md | Adds spec for aligning CLI backend timeouts with request timeout budget. |
| specs/3600-fresh-session-just-commands-for-local-tui-dev-loop.md | Adds spec for just recipes to reset local sessions for TUI dev. |
| specs/3585-codex-auth-model-compatibility-and-tui-startup.md | Adds spec for codex-auth model compatibility and TUI startup defaults. |
| specs/3582-tui-transcript-first-operator-terminal.md | Adds research-updated spec for transcript-first TUI direction. |
| scripts/dev/test-just-fresh-session.sh | Adds regression script validating new just recipes and session reset behavior. |
| justfile | Adds session-reset, stack-up-fresh, tui-fresh recipes and local runtime/TUI workflow. |
| docs/research/cli-interface-patterns-2026-03-16.md | Adds research notes informing transcript-first TUI patterns. |
| crates/tau-tui/tests/tui_demo_smoke.rs | Adds integration test asserting interactive mode fails loudly without a TTY. |
| crates/tau-tui/src/main.rs | Updates CLI help/default model and wires interactive/agent modes to gateway config. |
| crates/tau-tui/src/interactive/ui_transcript.rs | Implements transcript-first transcript rendering + scrolling behavior. |
| crates/tau-tui/src/interactive/ui_tests/transcript.rs | Adds render-path tests for transcript-first shell, activity, and state surfacing. |
| crates/tau-tui/src/interactive/ui_tests/palette.rs | Adds tests for command palette discovery, filtering, and execution. |
| crates/tau-tui/src/interactive/ui_tests/helpers.rs | Adds ratatui TestBackend helpers for key input and rendering assertions. |
| crates/tau-tui/src/interactive/ui_tests/evidence.rs | Adds tests covering mutating-evidence status rendering rules. |
| crates/tau-tui/src/interactive/ui_tests/detail_overlay.rs | Adds tests for narrow-layout detail overlay behavior and navigation. |
| crates/tau-tui/src/interactive/ui_tests/detail.rs | Adds tests for detail drawer sections and command routing. |
| crates/tau-tui/src/interactive/ui_tests/composer.rs | Adds tests for composer height, footer chips, and slash command paths. |
| crates/tau-tui/src/interactive/ui_tests/approval.rs | Adds tests for approval flows and attention strip affordances. |
| crates/tau-tui/src/interactive/ui_tests.rs | Registers the new ui test modules. |
| crates/tau-tui/src/interactive/ui_status.rs | Refactors status bar to show session/cwd/approval/transport/health/state context. |
| crates/tau-tui/src/interactive/ui_shared.rs | Adds shared UI helpers (badges/actions + latest running tool). |
| crates/tau-tui/src/interactive/ui_run_state_model.rs | Adds run-state card model, including evidence summary and streaming preview. |
| crates/tau-tui/src/interactive/ui_run_state.rs | Renders the run-state card and computes its dynamic height. |
| crates/tau-tui/src/interactive/ui_palette.rs | Implements command palette popover rendering. |
| crates/tau-tui/src/interactive/ui_overlay.rs | Implements help/detail/thinking overlays for narrow layouts and context. |
| crates/tau-tui/src/interactive/ui_drawer_sections.rs | Implements detail drawer section contents (tools/memory/cortex/sessions). |
| crates/tau-tui/src/interactive/ui_drawer.rs | Implements wide-layout right-side detail drawer with tab navigation. |
| crates/tau-tui/src/interactive/ui_composer.rs | Implements transcript-first composer rendering, footer chips, and cursor placement. |
| crates/tau-tui/src/interactive/ui_build_evidence.rs | Adds build/create evidence-state derivation from prompt + tool entries. |
| crates/tau-tui/src/interactive/ui_activity.rs | Updates live activity strip to include evidence-state and other context chips. |
| crates/tau-tui/src/interactive/ui.rs | Replaces legacy multi-panel UI with transcript-first shell and overlays/drawer. |
| crates/tau-tui/src/interactive/mod.rs | Reorganizes interactive module surface/export and gateway integration modules. |
| crates/tau-tui/src/interactive/gateway_tests.rs | Adds tests for SSE parsing and applying gateway events to app state. |
| crates/tau-tui/src/interactive/gateway_runtime_tests.rs | Adds integration-style tests for gateway streaming runtime + rendering. |
| crates/tau-tui/src/interactive/gateway_runtime.rs | Adds blocking reqwest-based SSE streaming runtime worker for gateway mode. |
| crates/tau-tui/src/interactive/gateway.rs | Adds SSE frame parsing + operator-state extraction and error normalization. |
| crates/tau-tui/src/interactive/command_catalog.rs | Adds command catalog, parsing, and matching for palette and bare commands. |
| crates/tau-tui/src/interactive/chat.rs | Adds helpers for replacing last assistant content and role-based queries. |
| crates/tau-tui/src/interactive/app_submit.rs | Adds unified submit path (slash commands vs prompts) and gateway submission. |
| crates/tau-tui/src/interactive/app_runtime.rs | Updates event loop to pump gateway events and simplifies input handling. |
| crates/tau-tui/src/interactive/app_nav.rs | Adds navigation helpers for insert/normal mode and transcript scrolling. |
| crates/tau-tui/src/interactive/app_gateway.rs | Adds app-side application of gateway events into chat/tools/operator state. |
| crates/tau-tui/src/interactive/app_focus.rs | Adds focus-cycling logic for normal vs insert mode. |
| crates/tau-tui/src/interactive/app_detail.rs | Adds detail section selection and cycling behavior. |
| crates/tau-tui/src/interactive/app_commands.rs | Refactors key handling, command execution, and global shortcuts. |
| crates/tau-tui/src/interactive/app.rs | Refactors core App state/config, gateway runtime wiring, and exported defaults. |
| crates/tau-tui/Cargo.toml | Adds reqwest dependency for gateway runtime. |
| crates/tau-provider/src/model_catalog.rs | Adds openai/gpt-5.3-codex to built-in model catalog + test assertion. |
| crates/tau-provider/src/client.rs | Aligns CLI backend timeout selection with request timeout budget + unit tests. |
| crates/tau-coding-agent/src/tests/auth_provider/runtime_and_startup.rs | Adds integration tests for progress/completion guards and oauth model rejection. |
| crates/tau-coding-agent/src/tests/auth_provider/auth_and_provider/provider_client_and_store.rs | Adds regression test for request-timeout budget vs codex backend timeout. |
| crates/tau-coding-agent/src/startup_local_runtime.rs | Adds oauth-token/session-token local model compatibility validation. |
| crates/tau-agent-core/src/tests/structured_output_and_parallel.rs | Adds unit/regression tests for new progress/completion evidence guard helpers. |
| crates/tau-agent-core/src/runtime_safety_progress.rs | Adds prompt/text classifiers and tool-evidence helpers for runtime guards. |
| crates/tau-agent-core/src/lib.rs | Wires new runtime safety module and adds new AgentError variants + replan prompts. |
| Cargo.lock | Locks reqwest addition for tau-tui. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| //! UI rendering with ratatui for the transcript-first interactive shell. | ||
|
|
||
| use ratatui::Frame; | ||
| #[path = "ui_activity.rs"] | ||
| mod activity; | ||
| #[path = "ui_build_evidence.rs"] | ||
| mod build_evidence; | ||
| #[path = "ui_composer.rs"] | ||
| mod composer; | ||
| #[path = "ui_drawer.rs"] | ||
| mod drawer; | ||
| #[path = "ui_drawer_sections.rs"] | ||
| mod drawer_sections; | ||
| #[path = "ui_overlay.rs"] | ||
| mod overlay; | ||
| #[path = "ui_palette.rs"] | ||
| mod palette; | ||
| #[path = "ui_run_state.rs"] | ||
| mod run_state; | ||
| #[path = "ui_shared.rs"] | ||
| mod shared; | ||
| #[path = "ui_status.rs"] | ||
| mod status_bar; | ||
| #[cfg(test)] | ||
| #[path = "ui_tests.rs"] | ||
| mod tests; | ||
| #[path = "ui_transcript.rs"] | ||
| mod transcript; |
There was a problem hiding this comment.
The PR description/spec reference is scoped to #3604 (mutating-evidence status in the TUI), but this diff also introduces a large transcript-first UI refactor, gateway-backed interactive runtime plumbing (reqwest + SSE parsing), justfile/session-reset workflow, provider CLI-timeout changes, and new agent runtime safety guards. This significantly expands scope/risk relative to the stated goal; consider splitting into smaller PRs or updating the PR description to explicitly cover these additional concerns and their rollout/testing expectations.
| ## Objective | ||
|
|
||
| Surface mutating-tool evidence state in the interactive TUI so build/create turns no longer look idle or deceptively complete while Tau has only read from the repo or has not written anything yet. | ||
|
|
||
| ## Inputs/Outputs | ||
|
|
There was a problem hiding this comment.
This spec file won’t be picked up by the repo’s spec tooling/contract: scripts/dev/spec-archive-index.sh only scans specs/*/spec.md, and AGENTS.md defines the binding per-issue spec path as specs/<issue-id>/spec.md with a Status: line. As written (specs/3604-tui-mutating-tool-evidence-status.md) it will be ignored and has no Status: field, so the archive/index will show it as missing/Unknown. Consider moving it to specs/3604/spec.md and adding a Status: Draft|Reviewed|Accepted|Implemented line near the top (and doing the same for the other new spec files added in this PR).
| fn tool_evidence_state(app: &App) -> BuildEvidenceState { | ||
| let mut has_successful_tool = false; | ||
| for entry in app.tools.entries() { | ||
| if entry.status != ToolStatus::Success { | ||
| continue; | ||
| } | ||
| has_successful_tool = true; | ||
| if MUTATING_TOOL_NAMES.contains(&entry.name.as_str()) { | ||
| return BuildEvidenceState::MutatingEvidenceConfirmed; | ||
| } | ||
| } | ||
| if has_successful_tool { | ||
| BuildEvidenceState::ReadOnlySoFar | ||
| } else { | ||
| BuildEvidenceState::NoMutatingEvidenceYet |
There was a problem hiding this comment.
tool_evidence_state treats any ToolStatus::Success entry in app.tools as “successful tool evidence”. But gateway operator-state updates are also recorded via push_tool_event with names like turn / artifact and map status == "completed" to ToolStatus::Success. That means a build/create turn can incorrectly flip from “no mutating evidence yet” to “read-only so far” even when no successful non-mutating tool (e.g. read) has run. Consider filtering evidence to only count real tool entries (e.g. known tool names / exclude operator-state entities) or storing operator-state updates separately from tool evidence.
| fn truncate(input: &str, max: usize) -> String { | ||
| if input.len() <= max { | ||
| return input.to_string(); | ||
| } | ||
| if max <= 3 { | ||
| return input[..max].to_string(); | ||
| } | ||
| format!("{}...", &input[..max - 3]) | ||
| } |
There was a problem hiding this comment.
truncate slices strings by byte index (input[..max] / input[..max - 3]), which will panic at runtime if the string contains non-ASCII UTF-8 and max falls on a non-char boundary (e.g., prompts with emojis or non-Latin text). Truncate should be implemented on char_indices()/chars() (or a grapheme-aware approach) to guarantee valid boundaries.
| fn truncate(input: &str, max: usize) -> String { | ||
| if input.len() <= max { | ||
| return input.to_string(); | ||
| } | ||
| if max > 3 { | ||
| return format!("{}...", &input[..max - 3]); | ||
| } | ||
| input[..max].to_string() | ||
| } |
There was a problem hiding this comment.
truncate slices strings by byte index (input[..max] / input[..max - 3]), which can panic at runtime for non-ASCII UTF-8 content when max lands mid-codepoint. Since this drawer renders user/assistant text, it should truncate on chars()/char_indices() (or graphemes) to ensure valid boundaries.
|
Closing this PR because it was based on the redesign branch stack rather than current |
Closes #3604
Spec
specs/3604-tui-mutating-tool-evidence-status.mdWhat
no mutating evidence yet,read-only so far, andmutating evidence confirmedinLive activitystill read-only/write/edit confirmedin the run-state cardWhy
Long build/create turns felt idle or deceptive because the shell did not distinguish between no write/edit evidence, read-only progress, and confirmed mutating progress.
Test evidence
cargo test -p tau-tui 3604 -- --nocapturecargo test -p tau-tuicargo run -p tau-tui -- interactive --profile ops-interactive