|
2 | 2 |
|
3 | 3 | This document captures the runtime contracts that must remain stable while improving quality enforcement. |
4 | 4 |
|
5 | | -## 1) Fixed workflow contract |
| 5 | +## 1) Governed workflow contract |
6 | 6 |
|
7 | | -AutoLabOS runs a fixed 9-node research workflow: |
| 7 | +AutoLabOS operates around a governed 9-node research workflow: |
8 | 8 |
|
9 | 9 | `collect_papers -> analyze_papers -> generate_hypotheses -> design_experiments -> implement_experiments -> run_experiments -> analyze_results -> review -> write_paper` |
10 | 10 |
|
11 | | -Do not add, remove, or reorder top-level nodes without an explicit contract change. |
| 11 | +This 9-node structure is the default top-level workflow contract and must remain stable unless an explicit contract change is made. |
| 12 | + |
| 13 | +Do not casually add, remove, reorder, or redefine top-level nodes. |
| 14 | + |
| 15 | +A top-level workflow change is allowed only when all of the following are true: |
| 16 | + |
| 17 | +- the change clearly improves the research/runtime contract rather than duplicating an existing stage |
| 18 | +- inspectable state transitions are preserved |
| 19 | +- artifact audibility is preserved |
| 20 | +- reproducibility is preserved |
| 21 | +- review gating and claim-ceiling discipline are preserved |
| 22 | +- safe backtracking behavior is preserved |
| 23 | +- the change is reflected consistently in docs, runtime behavior, and validation expectations |
| 24 | + |
| 25 | +Until those conditions are met, treat the 9-node workflow as fixed. |
12 | 26 |
|
13 | 27 | ## 2) Shared runtime surfaces |
14 | 28 |
|
15 | 29 | - TUI (`autolabos`) and local web ops UI (`autolabos web`) share the same interaction/runtime layer. |
16 | 30 | - Node execution and transitions are controlled by `StateGraphRuntime`. |
17 | 31 | - Approval mode and transition recommendation behavior are part of runtime contracts. |
18 | 32 |
|
19 | | -Harness work must preserve both TUI and web behaviors unless a change is explicitly requested. |
| 33 | +Harness and runtime work must preserve both TUI and web behaviors unless a change is explicitly requested. |
20 | 34 |
|
21 | 35 | ## 3) Artifact model |
22 | 36 |
|
23 | | -- Run-scoped source of truth: `.autolabos/runs/<run_id>/...` |
24 | | -- Public mirrored outputs: `outputs/<run-title>-<run_id_prefix>/...` |
| 37 | +- Run-scoped source of truth: `.autolabos/runs/<run-id>/...` |
| 38 | +- Public mirrored outputs: `outputs/<run-id>-...` |
25 | 39 | - Checkpoints and run context are persisted under each run directory. |
26 | 40 |
|
27 | 41 | Quality checks should be deterministic and file-based whenever possible. |
28 | 42 |
|
| 43 | +Public-facing outputs must remain traceable to underlying run artifacts. |
| 44 | + |
29 | 45 | ## 4) Node-internal loops are bounded |
30 | 46 |
|
31 | | -Internal control loops inside nodes (for example analyze/design/run/analyze/write) are allowed and expected, but they must remain bounded and auditable through artifacts/logs. |
| 47 | +Internal control loops inside nodes are allowed and expected, including loops in analysis, design, implementation, execution, result interpretation, and writing. |
| 48 | + |
| 49 | +However, these loops must remain: |
| 50 | + |
| 51 | +- bounded |
| 52 | +- auditable through artifacts or logs |
| 53 | +- consistent with node purpose |
| 54 | +- non-destructive to top-level workflow clarity |
| 55 | + |
| 56 | +Node-internal iteration must not be used to smuggle in an undeclared top-level workflow redesign. |
| 57 | + |
| 58 | +## 5) Review and paper-readiness contract |
| 59 | + |
| 60 | +`review` is a gate, not a cosmetic pass. |
| 61 | + |
| 62 | +The system must not treat workflow completion, `write_paper` completion, or successful PDF generation as equivalent to paper-ready research. |
| 63 | + |
| 64 | +Top-level progression to paper-writing behavior should preserve the distinction between: |
| 65 | + |
| 66 | +- system completion |
| 67 | +- artifact completion |
| 68 | +- research completion |
| 69 | +- paper readiness |
| 70 | + |
| 71 | +A paper-scale outcome requires evidence beyond successful orchestration, including baseline/comparator presence, real experiment execution, quantitative comparison, and claim-to-evidence linkage. |
32 | 72 |
|
33 | | -## 5) Harness engineering goals |
| 73 | +## 6) Research brief contract |
| 74 | + |
| 75 | +A governed run should begin from a research brief that defines the execution contract. |
| 76 | + |
| 77 | +At minimum, the brief structure should align with `docs/research-brief-template.md`, including: |
| 78 | + |
| 79 | +- Topic |
| 80 | +- Objective Metric |
| 81 | +- Constraints |
| 82 | +- Plan |
| 83 | +- Research Question |
| 84 | +- Why This Can Be Tested With A Small Real Experiment |
| 85 | +- Baseline / Comparator |
| 86 | +- Dataset / Task / Bench |
| 87 | +- Target Comparison |
| 88 | +- Minimum Acceptable Evidence |
| 89 | +- Disallowed Shortcuts |
| 90 | +- Allowed Budgeted Passes |
| 91 | +- Paper Ceiling If Evidence Remains Weak |
| 92 | +- Minimum Experiment Plan |
| 93 | +- Paper-worthiness Gate |
| 94 | +- Failure Conditions |
| 95 | + |
| 96 | +Missing governance fields should be treated as execution risks, not harmless omissions. |
| 97 | + |
| 98 | +## 7) Validation surfaces are first-class |
| 99 | + |
| 100 | +The following are first-class validation surfaces for contract enforcement: |
| 101 | + |
| 102 | +- real TUI validation |
| 103 | +- local web validation |
| 104 | +- targeted tests |
| 105 | +- smoke checks |
| 106 | +- harness validation |
| 107 | +- artifact inspection |
| 108 | +- `/doctor` diagnostics when applicable |
| 109 | + |
| 110 | +For interactive defects, real behavior is the primary ground truth. |
| 111 | +Tests and harness checks support but do not replace same-flow revalidation. |
| 112 | + |
| 113 | +## 8) Harness engineering goals |
34 | 114 |
|
35 | 115 | - Turn important quality assumptions into explicit checks. |
36 | 116 | - Keep checks cheap enough for routine CI. |
37 | | -- Fail early on structural incompleteness (missing required artifacts, malformed records). |
| 117 | +- Fail early on structural incompleteness such as missing required artifacts or malformed records. |
38 | 118 | - Keep enforcement incremental and compatible with current contracts. |
| 119 | +- Prefer minimal, high-confidence enforcement that improves observability and reproducibility. |
| 120 | + |
| 121 | +## 9) Reproducibility contract |
| 122 | + |
| 123 | +A run should not be treated as trustworthy unless its outputs and transitions can be inspected and rechecked. |
| 124 | + |
| 125 | +When applicable, validation should confirm: |
| 126 | + |
| 127 | +- checkpoint/state consistency |
| 128 | +- consistency between public-facing outputs and run-scoped artifacts |
| 129 | +- observable behavioral change, not only modified code paths |
| 130 | +- explicitly stated remaining validation or reproducibility gaps |
39 | 131 |
|
40 | | -## 6) Non-goals for this track |
| 132 | +## 10) Non-goals for this track |
41 | 133 |
|
42 | | -- No redesign of product UX. |
43 | | -- No broad refactor of orchestration/runtime. |
| 134 | +- No redesign of product UX without an explicit product-direction decision. |
| 135 | +- No broad refactor of orchestration/runtime without contract justification. |
44 | 136 | - No speculative replacement of existing node logic. |
| 137 | +- No weakening of review gating, evidence discipline, or reproducibility expectations for convenience. |
0 commit comments