Skip to content

[3604] Surface mutating tool evidence status in the TUI during build/create turns#3605

Closed
njfio wants to merge 109 commits intomasterfrom
3604-tui-mutating-evidence-status
Closed

[3604] Surface mutating tool evidence status in the TUI during build/create turns#3605
njfio wants to merge 109 commits intomasterfrom
3604-tui-mutating-evidence-status

Conversation

@njfio
Copy link
Owner

@njfio njfio commented Mar 20, 2026

Closes #3604

Spec

  • specs/3604-tui-mutating-tool-evidence-status.md

What

  • adds a shared build/create evidence helper for the interactive TUI
  • surfaces no mutating evidence yet, read-only so far, and mutating evidence confirmed in Live activity
  • surfaces still read-only / write/edit confirmed in the run-state card
  • omits the evidence status for non-build prompts and completed idle turns
  • keeps the status visible when the wide details drawer is open

Why

Long build/create turns felt idle or deceptive because the shell did not distinguish between no write/edit evidence, read-only progress, and confirmed mutating progress.

Test evidence

  • cargo test -p tau-tui 3604 -- --nocapture
  • cargo test -p tau-tui
  • smoke-launched cargo run -p tau-tui -- interactive --profile ops-interactive

njfio added 30 commits March 16, 2026 11:03
Copilot AI review requested due to automatic review settings March 20, 2026 02:19
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

njfio has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 34b34bbce1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +92 to +96
if entry.status != ToolStatus::Success {
continue;
}
has_successful_tool = true;
if MUTATING_TOOL_NAMES.contains(&entry.name.as_str()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Ignore non-tool operator events in evidence calculation

tool_evidence_state treats every successful entry in app.tools as tool evidence, but apply_operator_state records turn and artifact states there too. As soon as a streamed response emits an artifact/turn completion (with no actual tool call), this logic sets has_successful_tool = true and the UI reports read-only so far instead of no mutating evidence yet, which is a false safety signal for build/create prompts.

Useful? React with 👍 / 👎.

Comment on lines +91 to +92
for entry in app.tools.entries() {
if entry.status != ToolStatus::Success {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope mutating-evidence status to the active turn

This loop scans the entire session history (app.tools.entries()) when deriving build evidence, so evidence from earlier turns leaks into later ones. After any previous successful write/edit, a new build/create turn will immediately show write/edit confirmed before executing any mutating action in that turn, which misrepresents the current turn’s evidence state.

Useful? React with 👍 / 👎.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

njfio has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve operator confidence during long build/create turns by surfacing “mutating tool evidence” state in the interactive TUI (e.g., no mutating evidence yet vs read-only vs confirmed mutating), while also introducing broader interactive TUI architecture changes (transcript-first layout, gateway-backed streaming) and related runtime/provider safeguards.

Changes:

  • Add build/create evidence-state derivation and render it in both Live activity and the run-state card.
  • Introduce a transcript-first interactive TUI shell (status bar + activity strip + run-state card + transcript + composer), plus detail drawer/overlays and command palette tests.
  • Add gateway-backed interactive streaming support and related defaults (model selection), plus timeout alignment and runtime safety guards elsewhere.

Reviewed changes

Copilot reviewed 60 out of 61 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
specs/3604-tui-mutating-tool-evidence-status.md Adds #3604 spec describing evidence-state UI behavior and tests.
specs/3603-require-mutating-tool-evidence-for-build-completion.md Adds runtime safety spec for requiring mutating evidence for completion claims.
specs/3602-fail-closed-on-unverified-build-progress.md Adds runtime safety spec for failing closed on unverified progress.
specs/3601-cli-backend-timeout-aligns-with-request-timeout-budget.md Adds spec for aligning CLI backend timeouts with request timeout budget.
specs/3600-fresh-session-just-commands-for-local-tui-dev-loop.md Adds spec for just recipes to reset local sessions for TUI dev.
specs/3585-codex-auth-model-compatibility-and-tui-startup.md Adds spec for codex-auth model compatibility and TUI startup defaults.
specs/3582-tui-transcript-first-operator-terminal.md Adds research-updated spec for transcript-first TUI direction.
scripts/dev/test-just-fresh-session.sh Adds regression script validating new just recipes and session reset behavior.
justfile Adds session-reset, stack-up-fresh, tui-fresh recipes and local runtime/TUI workflow.
docs/research/cli-interface-patterns-2026-03-16.md Adds research notes informing transcript-first TUI patterns.
crates/tau-tui/tests/tui_demo_smoke.rs Adds integration test asserting interactive mode fails loudly without a TTY.
crates/tau-tui/src/main.rs Updates CLI help/default model and wires interactive/agent modes to gateway config.
crates/tau-tui/src/interactive/ui_transcript.rs Implements transcript-first transcript rendering + scrolling behavior.
crates/tau-tui/src/interactive/ui_tests/transcript.rs Adds render-path tests for transcript-first shell, activity, and state surfacing.
crates/tau-tui/src/interactive/ui_tests/palette.rs Adds tests for command palette discovery, filtering, and execution.
crates/tau-tui/src/interactive/ui_tests/helpers.rs Adds ratatui TestBackend helpers for key input and rendering assertions.
crates/tau-tui/src/interactive/ui_tests/evidence.rs Adds tests covering mutating-evidence status rendering rules.
crates/tau-tui/src/interactive/ui_tests/detail_overlay.rs Adds tests for narrow-layout detail overlay behavior and navigation.
crates/tau-tui/src/interactive/ui_tests/detail.rs Adds tests for detail drawer sections and command routing.
crates/tau-tui/src/interactive/ui_tests/composer.rs Adds tests for composer height, footer chips, and slash command paths.
crates/tau-tui/src/interactive/ui_tests/approval.rs Adds tests for approval flows and attention strip affordances.
crates/tau-tui/src/interactive/ui_tests.rs Registers the new ui test modules.
crates/tau-tui/src/interactive/ui_status.rs Refactors status bar to show session/cwd/approval/transport/health/state context.
crates/tau-tui/src/interactive/ui_shared.rs Adds shared UI helpers (badges/actions + latest running tool).
crates/tau-tui/src/interactive/ui_run_state_model.rs Adds run-state card model, including evidence summary and streaming preview.
crates/tau-tui/src/interactive/ui_run_state.rs Renders the run-state card and computes its dynamic height.
crates/tau-tui/src/interactive/ui_palette.rs Implements command palette popover rendering.
crates/tau-tui/src/interactive/ui_overlay.rs Implements help/detail/thinking overlays for narrow layouts and context.
crates/tau-tui/src/interactive/ui_drawer_sections.rs Implements detail drawer section contents (tools/memory/cortex/sessions).
crates/tau-tui/src/interactive/ui_drawer.rs Implements wide-layout right-side detail drawer with tab navigation.
crates/tau-tui/src/interactive/ui_composer.rs Implements transcript-first composer rendering, footer chips, and cursor placement.
crates/tau-tui/src/interactive/ui_build_evidence.rs Adds build/create evidence-state derivation from prompt + tool entries.
crates/tau-tui/src/interactive/ui_activity.rs Updates live activity strip to include evidence-state and other context chips.
crates/tau-tui/src/interactive/ui.rs Replaces legacy multi-panel UI with transcript-first shell and overlays/drawer.
crates/tau-tui/src/interactive/mod.rs Reorganizes interactive module surface/export and gateway integration modules.
crates/tau-tui/src/interactive/gateway_tests.rs Adds tests for SSE parsing and applying gateway events to app state.
crates/tau-tui/src/interactive/gateway_runtime_tests.rs Adds integration-style tests for gateway streaming runtime + rendering.
crates/tau-tui/src/interactive/gateway_runtime.rs Adds blocking reqwest-based SSE streaming runtime worker for gateway mode.
crates/tau-tui/src/interactive/gateway.rs Adds SSE frame parsing + operator-state extraction and error normalization.
crates/tau-tui/src/interactive/command_catalog.rs Adds command catalog, parsing, and matching for palette and bare commands.
crates/tau-tui/src/interactive/chat.rs Adds helpers for replacing last assistant content and role-based queries.
crates/tau-tui/src/interactive/app_submit.rs Adds unified submit path (slash commands vs prompts) and gateway submission.
crates/tau-tui/src/interactive/app_runtime.rs Updates event loop to pump gateway events and simplifies input handling.
crates/tau-tui/src/interactive/app_nav.rs Adds navigation helpers for insert/normal mode and transcript scrolling.
crates/tau-tui/src/interactive/app_gateway.rs Adds app-side application of gateway events into chat/tools/operator state.
crates/tau-tui/src/interactive/app_focus.rs Adds focus-cycling logic for normal vs insert mode.
crates/tau-tui/src/interactive/app_detail.rs Adds detail section selection and cycling behavior.
crates/tau-tui/src/interactive/app_commands.rs Refactors key handling, command execution, and global shortcuts.
crates/tau-tui/src/interactive/app.rs Refactors core App state/config, gateway runtime wiring, and exported defaults.
crates/tau-tui/Cargo.toml Adds reqwest dependency for gateway runtime.
crates/tau-provider/src/model_catalog.rs Adds openai/gpt-5.3-codex to built-in model catalog + test assertion.
crates/tau-provider/src/client.rs Aligns CLI backend timeout selection with request timeout budget + unit tests.
crates/tau-coding-agent/src/tests/auth_provider/runtime_and_startup.rs Adds integration tests for progress/completion guards and oauth model rejection.
crates/tau-coding-agent/src/tests/auth_provider/auth_and_provider/provider_client_and_store.rs Adds regression test for request-timeout budget vs codex backend timeout.
crates/tau-coding-agent/src/startup_local_runtime.rs Adds oauth-token/session-token local model compatibility validation.
crates/tau-agent-core/src/tests/structured_output_and_parallel.rs Adds unit/regression tests for new progress/completion evidence guard helpers.
crates/tau-agent-core/src/runtime_safety_progress.rs Adds prompt/text classifiers and tool-evidence helpers for runtime guards.
crates/tau-agent-core/src/lib.rs Wires new runtime safety module and adds new AgentError variants + replan prompts.
Cargo.lock Locks reqwest addition for tau-tui.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +27
//! UI rendering with ratatui for the transcript-first interactive shell.

use ratatui::Frame;
#[path = "ui_activity.rs"]
mod activity;
#[path = "ui_build_evidence.rs"]
mod build_evidence;
#[path = "ui_composer.rs"]
mod composer;
#[path = "ui_drawer.rs"]
mod drawer;
#[path = "ui_drawer_sections.rs"]
mod drawer_sections;
#[path = "ui_overlay.rs"]
mod overlay;
#[path = "ui_palette.rs"]
mod palette;
#[path = "ui_run_state.rs"]
mod run_state;
#[path = "ui_shared.rs"]
mod shared;
#[path = "ui_status.rs"]
mod status_bar;
#[cfg(test)]
#[path = "ui_tests.rs"]
mod tests;
#[path = "ui_transcript.rs"]
mod transcript;
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description/spec reference is scoped to #3604 (mutating-evidence status in the TUI), but this diff also introduces a large transcript-first UI refactor, gateway-backed interactive runtime plumbing (reqwest + SSE parsing), justfile/session-reset workflow, provider CLI-timeout changes, and new agent runtime safety guards. This significantly expands scope/risk relative to the stated goal; consider splitting into smaller PRs or updating the PR description to explicitly cover these additional concerns and their rollout/testing expectations.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +6
## Objective

Surface mutating-tool evidence state in the interactive TUI so build/create turns no longer look idle or deceptively complete while Tau has only read from the repo or has not written anything yet.

## Inputs/Outputs

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec file won’t be picked up by the repo’s spec tooling/contract: scripts/dev/spec-archive-index.sh only scans specs/*/spec.md, and AGENTS.md defines the binding per-issue spec path as specs/<issue-id>/spec.md with a Status: line. As written (specs/3604-tui-mutating-tool-evidence-status.md) it will be ignored and has no Status: field, so the archive/index will show it as missing/Unknown. Consider moving it to specs/3604/spec.md and adding a Status: Draft|Reviewed|Accepted|Implemented line near the top (and doing the same for the other new spec files added in this PR).

Copilot uses AI. Check for mistakes.
Comment on lines +89 to +103
fn tool_evidence_state(app: &App) -> BuildEvidenceState {
let mut has_successful_tool = false;
for entry in app.tools.entries() {
if entry.status != ToolStatus::Success {
continue;
}
has_successful_tool = true;
if MUTATING_TOOL_NAMES.contains(&entry.name.as_str()) {
return BuildEvidenceState::MutatingEvidenceConfirmed;
}
}
if has_successful_tool {
BuildEvidenceState::ReadOnlySoFar
} else {
BuildEvidenceState::NoMutatingEvidenceYet
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tool_evidence_state treats any ToolStatus::Success entry in app.tools as “successful tool evidence”. But gateway operator-state updates are also recorded via push_tool_event with names like turn / artifact and map status == "completed" to ToolStatus::Success. That means a build/create turn can incorrectly flip from “no mutating evidence yet” to “read-only so far” even when no successful non-mutating tool (e.g. read) has run. Consider filtering evidence to only count real tool entries (e.g. known tool names / exclude operator-state entities) or storing operator-state updates separately from tool evidence.

Copilot uses AI. Check for mistakes.
Comment on lines +166 to +174
fn truncate(input: &str, max: usize) -> String {
if input.len() <= max {
return input.to_string();
}
if max <= 3 {
return input[..max].to_string();
}
format!("{}...", &input[..max - 3])
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncate slices strings by byte index (input[..max] / input[..max - 3]), which will panic at runtime if the string contains non-ASCII UTF-8 and max falls on a non-char boundary (e.g., prompts with emojis or non-Latin text). Truncate should be implemented on char_indices()/chars() (or a grapheme-aware approach) to guarantee valid boundaries.

Copilot uses AI. Check for mistakes.
Comment on lines +165 to +173
fn truncate(input: &str, max: usize) -> String {
if input.len() <= max {
return input.to_string();
}
if max > 3 {
return format!("{}...", &input[..max - 3]);
}
input[..max].to_string()
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncate slices strings by byte index (input[..max] / input[..max - 3]), which can panic at runtime for non-ASCII UTF-8 content when max lands mid-codepoint. Since this drawer renders user/assistant text, it should truncate on chars()/char_indices() (or graphemes) to ensure valid boundaries.

Copilot uses AI. Check for mistakes.
@njfio
Copy link
Owner Author

njfio commented Mar 20, 2026

Closing this PR because it was based on the redesign branch stack rather than current master, so the diff includes unrelated TUI/runtime work and does not represent a clean #3604 port. The issue is reopened and the working implementation remains validated on the redesign-side worktree.

@njfio njfio closed this Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Surface mutating tool evidence status in the TUI during build/create turns

2 participants