Skip to content

Commit 7d79cdc

Browse files
lhy0718Copilot
andcommitted
refine autonomous mode: review gate, time limits, stopAfterApprovalBoundary
- Fix over-aggressive auto-approval: remove review/write_paper from autoApproveNodes; add WritePaperGateConfig with evidence bar checks (baseline, quantitative results, branch score, manuscript type) - Add meetsWritePaperBar() with 3 gate checkpoints: top-of-loop pre-exec, recommendation path, no-recommendation path; backtrack to design_experiments on failure - Add stopAfterApprovalBoundary option to runCurrentAgentWithOptions so autonomous controller gets per-node control between approval gates - Overnight runtime: 8h → 24h (maxMinutes: 1440) - Autonomous runtime: 24h → unbounded (maxMinutes: Infinity) with Number.isFinite() guard - Progress reporter: add runtimePolicy, writePaperGateBlocked, writePaperGateBlockers to snapshot and markdown output - TUI/CLI: overnight '24-hour limit', autonomous 'No runtime time limit' and 'write_paper gated by minimum evidence bar' - 10 new tests (30 total), all passing; 881/882 full suite pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 7aefc35 commit 7d79cdc

File tree

7 files changed

+479
-24
lines changed

7 files changed

+479
-24
lines changed

ISSUES.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,3 +163,33 @@
163163
- Paper-pressure consolidation jumps to `review` node with force mode; if review node encounters artifacts from a different cycle context, it may produce a review that doesn't match the latest experiment
164164
- Stagnation detection relies on node-level notes; if nodes don't write meaningful notes, novelty detection may under-count signals
165165
- `evaluateBestBranch` reads artifacts at fixed paths; if the run has branched into multiple experiment directions, only the latest artifacts are evaluated
166+
167+
---
168+
169+
### AM-002 — Autonomous Mode Refinement: Review Gate, Time Limits, stopAfterApprovalBoundary
170+
- Status: IMPLEMENTED
171+
- Category: feature refinement
172+
- Validation target: `AutonomousRunController.runAutonomous()`, `AgentOrchestrator.runCurrentAgentWithOptions()`, review/write_paper gating, time-limit policy
173+
- Summary: Corrected over-aggressive auto-approval in autonomous mode and adjusted time limits:
174+
- **Review gate**: `review` and `write_paper` removed from `autoApproveNodes`. Review is a real structural gate; write_paper is only entered when `meetsWritePaperBar()` evidence bar is met.
175+
- **WritePaperGateConfig**: New config with `requireBaselineOrComparator`, `requireQuantitativeResults`, `minBranchScore`, `blockedManuscriptTypes`. On failure, backtracks to `design_experiments`.
176+
- **Three gate checkpoints**: (1) top-of-loop pre-execution check for `currentNode === "write_paper"`, (2) recommendation path check when advancing from review, (3) no-recommendation path check at review/write_paper.
177+
- **stopAfterApprovalBoundary**: Added `stopAfterApprovalBoundary?: boolean` to `AgentOrchestrator.runCurrentAgentWithOptions()`. Autonomous mode uses `stopAfterApprovalBoundary: true` so the runtime returns after each approval gate, giving the controller a chance to check evidence gates between nodes.
178+
- **Overnight runtime**: 8h → 24h (`maxMinutes: 1440`)
179+
- **Autonomous runtime**: 24h → unbounded (`maxMinutes: Infinity`), with `Number.isFinite()` guard
180+
- **Progress reporter**: Added `runtimePolicy`, `writePaperGateBlocked`, `writePaperGateBlockers` to snapshot; shown in markdown status output
181+
- **TUI/CLI copy**: Updated overnight banner ("24-hour limit"), autonomous banner ("No runtime time limit", "write_paper gated by minimum evidence bar")
182+
- Tests: 10 new tests added (30 total), all passing:
183+
- Policy limits (overnight 24h, autonomous Infinity)
184+
- Gate config defaults
185+
- autoApproveNodes exclusions (review, write_paper)
186+
- meetsWritePaperBar: passes, blocks, no-branch
187+
- Gate blocks at review node (integration)
188+
- Gate blocks advance recommendation from review (integration)
189+
- No time_limit stop with Infinity
190+
- Evidence: 881/882 tests pass (10 new). Only pre-existing `zzz_noProjectRootLeak` failure.
191+
- Architecture insight: Two-level approval system — runtime `resolveApprovalGate()` auto-approves nodes in "minimal" mode BEFORE the controller sees them. `stopAfterApprovalBoundary: true` is the key fix that gives the controller per-node control.
192+
- Risks:
193+
- `stopAfterApprovalBoundary: true` means each node takes one controller iteration, making the loop slower (more iterations per cycle). Acceptable for autonomous long-running mode.
194+
- If `evaluateBestBranch` misreads evidence artifacts, the gate may incorrectly block or pass write_paper
195+
- `minBranchScore: 5` threshold may need tuning based on real-world evidence patterns

src/core/agents/agentOrchestrator.ts

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ export class AgentOrchestrator {
7575

7676
async runCurrentAgentWithOptions(
7777
runId: string,
78-
opts?: { abortSignal?: AbortSignal }
78+
opts?: { abortSignal?: AbortSignal; stopAfterApprovalBoundary?: boolean }
7979
): Promise<AgentRunResponse> {
8080
await this.runtime.start(runId);
8181
const current = await this.runStore.getRun(runId);
@@ -84,7 +84,8 @@ export class AgentOrchestrator {
8484
}
8585
await this.runtime.runUntilPause(runId, {
8686
abortSignal: opts?.abortSignal,
87-
floorNode: current.currentNode
87+
floorNode: current.currentNode,
88+
stopAfterApprovalBoundary: opts?.stopAfterApprovalBoundary
8889
});
8990
const run = await this.getPersistedRunOrThrow(runId);
9091

src/core/agents/autonomousProgressReporter.ts

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,12 @@ export interface AutonomousCycleSnapshot {
2727
evidenceGaps?: string[];
2828
nextUpgradeAction?: string;
2929
whyContinued?: string;
30+
/** Runtime policy description: "24h" or "unbounded" */
31+
runtimePolicy?: string;
32+
/** Whether write_paper is currently blocked by the evidence gate */
33+
writePaperGateBlocked?: boolean;
34+
/** Specific conditions blocking write_paper entry */
35+
writePaperGateBlockers?: string[];
3036
}
3137

3238
// ---------------------------------------------------------------------------
@@ -63,6 +69,7 @@ export class AutonomousProgressReporter {
6369
: `# Autonomous Run Status — ${run.id.slice(0, 8)}\n\n` +
6470
`**Topic:** ${run.topic}\n` +
6571
`**Mode:** ${snap.mode}\n` +
72+
`**Runtime policy:** ${snap.runtimePolicy || (snap.mode === "autonomous" ? "unbounded" : "24h")}\n` +
6673
`**Started:** ${new Date().toISOString()}\n\n` +
6774
`---\n\n`;
6875

@@ -89,6 +96,9 @@ export class AutonomousProgressReporter {
8996
lines.push(`| Field | Value |`);
9097
lines.push(`|-------|-------|`);
9198
lines.push(`| Mode | ${snap.mode} |`);
99+
if (snap.runtimePolicy) {
100+
lines.push(`| Runtime Policy | ${snap.runtimePolicy} |`);
101+
}
92102
lines.push(`| Current Node | ${snap.currentNode} |`);
93103
lines.push(`| Status | ${snap.status} |`);
94104
lines.push(`| Paper Status | ${snap.paperStatus} |`);
@@ -122,6 +132,11 @@ export class AutonomousProgressReporter {
122132
lines.push(`| Next Upgrade Action | ${snap.nextUpgradeAction} |`);
123133
}
124134

135+
// Write-paper gate status
136+
if (snap.writePaperGateBlocked != null) {
137+
lines.push(`| Write-Paper Gate | ${snap.writePaperGateBlocked ? "⛔ BLOCKED" : "✅ PASSED"} |`);
138+
}
139+
125140
lines.push("");
126141
lines.push(`**Message:** ${snap.message}`);
127142

@@ -130,6 +145,15 @@ export class AutonomousProgressReporter {
130145
lines.push(`**Why continued:** ${snap.whyContinued}`);
131146
}
132147

148+
// Write-paper gate blockers
149+
if (snap.writePaperGateBlocked && snap.writePaperGateBlockers && snap.writePaperGateBlockers.length > 0) {
150+
lines.push("");
151+
lines.push("**Write-paper gate blockers (conditions not met for drafting):**");
152+
for (const blocker of snap.writePaperGateBlockers) {
153+
lines.push(`- ${blocker}`);
154+
}
155+
}
156+
133157
if (snap.noveltySignals.length > 0) {
134158
lines.push("");
135159
lines.push("**Recent Novelty Signals:**");
@@ -169,6 +193,9 @@ export class AutonomousProgressReporter {
169193
lines.push(`| Field | Value |`);
170194
lines.push(`|-------|-------|`);
171195
lines.push(`| Mode | ${snap.mode} |`);
196+
if (snap.runtimePolicy) {
197+
lines.push(`| Runtime Policy | ${snap.runtimePolicy} |`);
198+
}
172199
lines.push(`| Total Cycles | ${snap.cycle} |`);
173200
lines.push(`| Total Iterations | ${snap.iteration} |`);
174201
lines.push(`| Final Node | ${snap.currentNode} |`);
@@ -182,6 +209,9 @@ export class AutonomousProgressReporter {
182209
if (snap.paperCandidateStatus) {
183210
lines.push(`| Paper Candidate | ${snap.paperCandidateStatus} |`);
184211
}
212+
if (snap.writePaperGateBlocked != null) {
213+
lines.push(`| Write-Paper Gate | ${snap.writePaperGateBlocked ? "⛔ BLOCKED" : "✅ PASSED"} |`);
214+
}
185215

186216
lines.push("");
187217
lines.push(`**Why stopped:** ${snap.message}`);

0 commit comments

Comments
 (0)