[3604] Surface mutating tool evidence status in the TUI during build/create turns by njfio · Pull Request #3605 · njfio/Tau

njfio · 2026-03-20T02:19:11Z

Spec

specs/3604-tui-mutating-tool-evidence-status.md

What

adds a shared build/create evidence helper for the interactive TUI
surfaces no mutating evidence yet, read-only so far, and mutating evidence confirmed in Live activity
surfaces still read-only / write/edit confirmed in the run-state card
omits the evidence status for non-build prompts and completed idle turns
keeps the status visible when the wide details drawer is open

Why

Long build/create turns felt idle or deceptive because the shell did not distinguish between no write/edit evidence, read-only progress, and confirmed mutating progress.

Test evidence

cargo test -p tau-tui 3604 -- --nocapture
cargo test -p tau-tui
smoke-launched cargo run -p tau-tui -- interactive --profile ops-interactive

…hell

… path

greptile-apps

njfio has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 34b34bbce1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T02:23:28Z

crates/tau-tui/src/interactive/ui_build_evidence.rs

+        if entry.status != ToolStatus::Success {
+            continue;
+        }
+        has_successful_tool = true;
+        if MUTATING_TOOL_NAMES.contains(&entry.name.as_str()) {


Ignore non-tool operator events in evidence calculation

tool_evidence_state treats every successful entry in app.tools as tool evidence, but apply_operator_state records turn and artifact states there too. As soon as a streamed response emits an artifact/turn completion (with no actual tool call), this logic sets has_successful_tool = true and the UI reports read-only so far instead of no mutating evidence yet, which is a false safety signal for build/create prompts.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T02:23:28Z

crates/tau-tui/src/interactive/ui_build_evidence.rs

+    for entry in app.tools.entries() {
+        if entry.status != ToolStatus::Success {


Scope mutating-evidence status to the active turn

This loop scans the entire session history (app.tools.entries()) when deriving build evidence, so evidence from earlier turns leaks into later ones. After any previous successful write/edit, a new build/create turn will immediately show write/edit confirmed before executing any mutating action in that turn, which misrepresents the current turn’s evidence state.

Useful? React with 👍 / 👎.

greptile-apps

njfio has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Copilot

Pull request overview

This PR aims to improve operator confidence during long build/create turns by surfacing “mutating tool evidence” state in the interactive TUI (e.g., no mutating evidence yet vs read-only vs confirmed mutating), while also introducing broader interactive TUI architecture changes (transcript-first layout, gateway-backed streaming) and related runtime/provider safeguards.

Changes:

Add build/create evidence-state derivation and render it in both Live activity and the run-state card.
Introduce a transcript-first interactive TUI shell (status bar + activity strip + run-state card + transcript + composer), plus detail drawer/overlays and command palette tests.
Add gateway-backed interactive streaming support and related defaults (model selection), plus timeout alignment and runtime safety guards elsewhere.

Reviewed changes

Copilot reviewed 60 out of 61 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
specs/3604-tui-mutating-tool-evidence-status.md	Adds #3604 spec describing evidence-state UI behavior and tests.
specs/3603-require-mutating-tool-evidence-for-build-completion.md	Adds runtime safety spec for requiring mutating evidence for completion claims.
specs/3602-fail-closed-on-unverified-build-progress.md	Adds runtime safety spec for failing closed on unverified progress.
specs/3601-cli-backend-timeout-aligns-with-request-timeout-budget.md	Adds spec for aligning CLI backend timeouts with request timeout budget.
specs/3600-fresh-session-just-commands-for-local-tui-dev-loop.md	Adds spec for `just` recipes to reset local sessions for TUI dev.
specs/3585-codex-auth-model-compatibility-and-tui-startup.md	Adds spec for codex-auth model compatibility and TUI startup defaults.
specs/3582-tui-transcript-first-operator-terminal.md	Adds research-updated spec for transcript-first TUI direction.
scripts/dev/test-just-fresh-session.sh	Adds regression script validating new `just` recipes and session reset behavior.
justfile	Adds `session-reset`, `stack-up-fresh`, `tui-fresh` recipes and local runtime/TUI workflow.
docs/research/cli-interface-patterns-2026-03-16.md	Adds research notes informing transcript-first TUI patterns.
crates/tau-tui/tests/tui_demo_smoke.rs	Adds integration test asserting interactive mode fails loudly without a TTY.
crates/tau-tui/src/main.rs	Updates CLI help/default model and wires interactive/agent modes to gateway config.
crates/tau-tui/src/interactive/ui_transcript.rs	Implements transcript-first transcript rendering + scrolling behavior.
crates/tau-tui/src/interactive/ui_tests/transcript.rs	Adds render-path tests for transcript-first shell, activity, and state surfacing.
crates/tau-tui/src/interactive/ui_tests/palette.rs	Adds tests for command palette discovery, filtering, and execution.
crates/tau-tui/src/interactive/ui_tests/helpers.rs	Adds ratatui TestBackend helpers for key input and rendering assertions.
crates/tau-tui/src/interactive/ui_tests/evidence.rs	Adds tests covering mutating-evidence status rendering rules.
crates/tau-tui/src/interactive/ui_tests/detail_overlay.rs	Adds tests for narrow-layout detail overlay behavior and navigation.
crates/tau-tui/src/interactive/ui_tests/detail.rs	Adds tests for detail drawer sections and command routing.
crates/tau-tui/src/interactive/ui_tests/composer.rs	Adds tests for composer height, footer chips, and slash command paths.
crates/tau-tui/src/interactive/ui_tests/approval.rs	Adds tests for approval flows and attention strip affordances.
crates/tau-tui/src/interactive/ui_tests.rs	Registers the new ui test modules.
crates/tau-tui/src/interactive/ui_status.rs	Refactors status bar to show session/cwd/approval/transport/health/state context.
crates/tau-tui/src/interactive/ui_shared.rs	Adds shared UI helpers (badges/actions + latest running tool).
crates/tau-tui/src/interactive/ui_run_state_model.rs	Adds run-state card model, including evidence summary and streaming preview.
crates/tau-tui/src/interactive/ui_run_state.rs	Renders the run-state card and computes its dynamic height.
crates/tau-tui/src/interactive/ui_palette.rs	Implements command palette popover rendering.
crates/tau-tui/src/interactive/ui_overlay.rs	Implements help/detail/thinking overlays for narrow layouts and context.
crates/tau-tui/src/interactive/ui_drawer_sections.rs	Implements detail drawer section contents (tools/memory/cortex/sessions).
crates/tau-tui/src/interactive/ui_drawer.rs	Implements wide-layout right-side detail drawer with tab navigation.
crates/tau-tui/src/interactive/ui_composer.rs	Implements transcript-first composer rendering, footer chips, and cursor placement.
crates/tau-tui/src/interactive/ui_build_evidence.rs	Adds build/create evidence-state derivation from prompt + tool entries.
crates/tau-tui/src/interactive/ui_activity.rs	Updates live activity strip to include evidence-state and other context chips.
crates/tau-tui/src/interactive/ui.rs	Replaces legacy multi-panel UI with transcript-first shell and overlays/drawer.
crates/tau-tui/src/interactive/mod.rs	Reorganizes interactive module surface/export and gateway integration modules.
crates/tau-tui/src/interactive/gateway_tests.rs	Adds tests for SSE parsing and applying gateway events to app state.
crates/tau-tui/src/interactive/gateway_runtime_tests.rs	Adds integration-style tests for gateway streaming runtime + rendering.
crates/tau-tui/src/interactive/gateway_runtime.rs	Adds blocking reqwest-based SSE streaming runtime worker for gateway mode.
crates/tau-tui/src/interactive/gateway.rs	Adds SSE frame parsing + operator-state extraction and error normalization.
crates/tau-tui/src/interactive/command_catalog.rs	Adds command catalog, parsing, and matching for palette and bare commands.
crates/tau-tui/src/interactive/chat.rs	Adds helpers for replacing last assistant content and role-based queries.
crates/tau-tui/src/interactive/app_submit.rs	Adds unified submit path (slash commands vs prompts) and gateway submission.
crates/tau-tui/src/interactive/app_runtime.rs	Updates event loop to pump gateway events and simplifies input handling.
crates/tau-tui/src/interactive/app_nav.rs	Adds navigation helpers for insert/normal mode and transcript scrolling.
crates/tau-tui/src/interactive/app_gateway.rs	Adds app-side application of gateway events into chat/tools/operator state.
crates/tau-tui/src/interactive/app_focus.rs	Adds focus-cycling logic for normal vs insert mode.
crates/tau-tui/src/interactive/app_detail.rs	Adds detail section selection and cycling behavior.
crates/tau-tui/src/interactive/app_commands.rs	Refactors key handling, command execution, and global shortcuts.
crates/tau-tui/src/interactive/app.rs	Refactors core App state/config, gateway runtime wiring, and exported defaults.
crates/tau-tui/Cargo.toml	Adds `reqwest` dependency for gateway runtime.
crates/tau-provider/src/model_catalog.rs	Adds `openai/gpt-5.3-codex` to built-in model catalog + test assertion.
crates/tau-provider/src/client.rs	Aligns CLI backend timeout selection with request timeout budget + unit tests.
crates/tau-coding-agent/src/tests/auth_provider/runtime_and_startup.rs	Adds integration tests for progress/completion guards and oauth model rejection.
crates/tau-coding-agent/src/tests/auth_provider/auth_and_provider/provider_client_and_store.rs	Adds regression test for request-timeout budget vs codex backend timeout.
crates/tau-coding-agent/src/startup_local_runtime.rs	Adds oauth-token/session-token local model compatibility validation.
crates/tau-agent-core/src/tests/structured_output_and_parallel.rs	Adds unit/regression tests for new progress/completion evidence guard helpers.
crates/tau-agent-core/src/runtime_safety_progress.rs	Adds prompt/text classifiers and tool-evidence helpers for runtime guards.
crates/tau-agent-core/src/lib.rs	Wires new runtime safety module and adds new AgentError variants + replan prompts.
Cargo.lock	Locks `reqwest` addition for `tau-tui`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T02:27:15Z

crates/tau-tui/src/interactive/ui.rs

+//! UI rendering with ratatui for the transcript-first interactive shell.

-use ratatui::Frame;
+#[path = "ui_activity.rs"]
+mod activity;
+#[path = "ui_build_evidence.rs"]
+mod build_evidence;
+#[path = "ui_composer.rs"]
+mod composer;
+#[path = "ui_drawer.rs"]
+mod drawer;
+#[path = "ui_drawer_sections.rs"]
+mod drawer_sections;
+#[path = "ui_overlay.rs"]
+mod overlay;
+#[path = "ui_palette.rs"]
+mod palette;
+#[path = "ui_run_state.rs"]
+mod run_state;
+#[path = "ui_shared.rs"]
+mod shared;
+#[path = "ui_status.rs"]
+mod status_bar;
+#[cfg(test)]
+#[path = "ui_tests.rs"]
+mod tests;
+#[path = "ui_transcript.rs"]
+mod transcript;


The PR description/spec reference is scoped to #3604 (mutating-evidence status in the TUI), but this diff also introduces a large transcript-first UI refactor, gateway-backed interactive runtime plumbing (reqwest + SSE parsing), justfile/session-reset workflow, provider CLI-timeout changes, and new agent runtime safety guards. This significantly expands scope/risk relative to the stated goal; consider splitting into smaller PRs or updating the PR description to explicitly cover these additional concerns and their rollout/testing expectations.

Copilot · 2026-03-20T02:27:15Z

specs/3604-tui-mutating-tool-evidence-status.md

+## Objective
+
+Surface mutating-tool evidence state in the interactive TUI so build/create turns no longer look idle or deceptively complete while Tau has only read from the repo or has not written anything yet.
+
+## Inputs/Outputs
+


This spec file won’t be picked up by the repo’s spec tooling/contract: scripts/dev/spec-archive-index.sh only scans specs/*/spec.md, and AGENTS.md defines the binding per-issue spec path as specs/<issue-id>/spec.md with a Status: line. As written (specs/3604-tui-mutating-tool-evidence-status.md) it will be ignored and has no Status: field, so the archive/index will show it as missing/Unknown. Consider moving it to specs/3604/spec.md and adding a Status: Draft|Reviewed|Accepted|Implemented line near the top (and doing the same for the other new spec files added in this PR).

Copilot · 2026-03-20T02:27:15Z

crates/tau-tui/src/interactive/ui_build_evidence.rs

+fn tool_evidence_state(app: &App) -> BuildEvidenceState {
+    let mut has_successful_tool = false;
+    for entry in app.tools.entries() {
+        if entry.status != ToolStatus::Success {
+            continue;
+        }
+        has_successful_tool = true;
+        if MUTATING_TOOL_NAMES.contains(&entry.name.as_str()) {
+            return BuildEvidenceState::MutatingEvidenceConfirmed;
+        }
+    }
+    if has_successful_tool {
+        BuildEvidenceState::ReadOnlySoFar
+    } else {
+        BuildEvidenceState::NoMutatingEvidenceYet


tool_evidence_state treats any ToolStatus::Success entry in app.tools as “successful tool evidence”. But gateway operator-state updates are also recorded via push_tool_event with names like turn / artifact and map status == "completed" to ToolStatus::Success. That means a build/create turn can incorrectly flip from “no mutating evidence yet” to “read-only so far” even when no successful non-mutating tool (e.g. read) has run. Consider filtering evidence to only count real tool entries (e.g. known tool names / exclude operator-state entities) or storing operator-state updates separately from tool evidence.

Copilot · 2026-03-20T02:27:15Z

crates/tau-tui/src/interactive/ui_run_state_model.rs

+fn truncate(input: &str, max: usize) -> String {
+    if input.len() <= max {
+        return input.to_string();
+    }
+    if max <= 3 {
+        return input[..max].to_string();
+    }
+    format!("{}...", &input[..max - 3])
+}


truncate slices strings by byte index (input[..max] / input[..max - 3]), which will panic at runtime if the string contains non-ASCII UTF-8 and max falls on a non-char boundary (e.g., prompts with emojis or non-Latin text). Truncate should be implemented on char_indices()/chars() (or a grapheme-aware approach) to guarantee valid boundaries.

Copilot · 2026-03-20T02:27:16Z

crates/tau-tui/src/interactive/ui_drawer_sections.rs

+fn truncate(input: &str, max: usize) -> String {
+    if input.len() <= max {
+        return input.to_string();
+    }
+    if max > 3 {
+        return format!("{}...", &input[..max - 3]);
+    }
+    input[..max].to_string()
+}


truncate slices strings by byte index (input[..max] / input[..max - 3]), which can panic at runtime for non-ASCII UTF-8 content when max lands mid-codepoint. Since this drawer renders user/assistant text, it should truncate on chars()/char_indices() (or graphemes) to ensure valid boundaries.

njfio · 2026-03-20T02:32:31Z

Closing this PR because it was based on the redesign branch stack rather than current master, so the diff includes unrelated TUI/runtime work and does not represent a clean #3604 port. The issue is reopened and the working implementation remains validated on the redesign-side worktree.

njfio added 30 commits March 16, 2026 11:03

docs(3583): add tui redesign spec

d05a16a

docs(3582): align tui spec with opencode research

96ba3c5

test(3582): red tests for transcript first tui shell

5137911

feat(3582): add transcript first tui shell baseline

d1d61e0

refactor(3582): simplify transcript message updates

cd5c6b0

test(3582): red tests for transcript shell renderer

1026944

feat(3582): add transcript shell renderer

88b8225

refactor(3582): split interactive tui modules

b439c8e

integrate(3582): cover interactive tui entrypoint smoke path

31ae495

test(3582): red tests for attention strip and responsive drawer

bb82ef6

feat(3582): add responsive details drawer and attention strip

9fb2f1f

refactor(3582): split transcript activity and focus helpers

1a9e0f9

integrate(3582): exercise drawer context via interactive input path

dfe055a

test(3582): red tests for approvals cards and detail overlay

f03bfaa

feat(3582): add approval strip overlay and transcript cards

a529510

refactor(3582): split interactive navigation helpers

1dc5604

integrate(3582): cover approval flow through interactive input path

75d0d6a

test(3582): red tests for composer chips and detail summaries

432e3e6

feat(3582): improve tui composer approvals and detail summaries

385b902

refactor(3582): simplify attention and detail rendering

618015d

integrate(3582): wire interactive shell composer and detail affordances

bf48fe9

test(3582): red tests for transcript run-state cards

9aa2820

feat(3582): add transcript run-state cards

f1c2dce

refactor(3582): split transcript tui tests by surface

613ffa7

integrate(3582): validate transcript run-state cards in interactive s…

7797b8e

…hell

test(3582): red tests for live execution blocks

116491d

feat(3582): enrich live execution blocks in transcript shell

5157a95

refactor(3582): share transcript shell ui primitives

4b40d26

integrate(3582): validate live execution blocks in interactive shell

66e8572

test(3582): red tests for drawer state summaries

35ac8e6

njfio added 20 commits March 19, 2026 20:32

test(3601): red tests for cli backend timeout alignment

599945a

feat(3601): align cli backend timeouts with request budget

4a941cf

refactor(3601): document cli backend timeout selection

3687308

integrate(3601): verify live codex backend honors request timeout budget

b04fa55

docs(3602): add spec for unverified build-progress guard

e81d2d2

test(3602): red tests for unverified build-progress guard

61cf135

feat(3602): fail closed on unverified build progress

e566af7

refactor(3602): isolate implementation-progress safety classifier

f8004b3

integrate(3602): verify build-progress guard through codex runtime path

4bd7c45

docs(3603): add spec for mutating tool-evidence build guard

707fccd

test(3603): red tests for mutating tool-evidence build guard

bc35b9a

feat(3603): require mutating tool evidence for build completion

f9eda45

refactor(3603): centralize tool-evidence predicates

1b2fab9

integrate(3603): verify mutating evidence guard through codex runtime…

3c6d2b6

… path

docs(3604): add spec for tui mutating-evidence status

2d47164

docs(3604): clarify positive mutating-evidence status

c5f427c

test(3604): red tests for tui mutating-evidence status

ca53405

feat(3604): surface mutating tool evidence in the tui

5cacc7d

refactor(3604): tighten live build-evidence rendering

2b503bb

integrate(3604): keep build-evidence status visible in the live shell

34b34bb

Copilot AI review requested due to automatic review settings March 20, 2026 02:19

greptile-apps bot reviewed Mar 20, 2026

View reviewed changes

njfio mentioned this pull request Mar 20, 2026

Surface mutating tool evidence status in the TUI during build/create turns #3604

Closed

Copilot started reviewing on behalf of njfio March 20, 2026 02:19 View session

docs(3604): refresh roadmap status sync blocks

9275380

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

greptile-apps bot reviewed Mar 20, 2026

View reviewed changes

Copilot AI reviewed Mar 20, 2026

View reviewed changes

njfio closed this Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[3604] Surface mutating tool evidence status in the TUI during build/create turns#3605

[3604] Surface mutating tool evidence status in the TUI during build/create turns#3605
njfio wants to merge 109 commits intomasterfrom
3604-tui-mutating-evidence-status

njfio commented Mar 20, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

njfio commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		for entry in app.tools.entries() {
		if entry.status != ToolStatus::Success {

Conversation

njfio commented Mar 20, 2026

Spec

What

Why

Test evidence

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

njfio commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants