kulesh · kulesh · Feb 6, 2026 · Feb 6, 2026 · Feb 6, 2026 · Feb 6, 2026
diff --git a/README.md b/README.md
@@ -167,6 +167,7 @@ docs/
 - [docs/journey-state-machine.md](./docs/journey-state-machine.md) - Journey states and transitions
 - [docs/genspec-format.md](./docs/genspec-format.md) - Genspec format reference
 - [docs/testing-strategy.md](./docs/testing-strategy.md) - Testing approach
+- [docs/adr/README.md](./docs/adr/README.md) - Architecture decision records
 - [docs/architecture-roadmap.md](./docs/architecture-roadmap.md) - Architecture roadmap
 
 ## How It Works

diff --git a/docs/README.md b/docs/README.md
@@ -13,6 +13,7 @@ This folder contains the product, architecture, and testing references for Waypo
 - [journey-state-machine.md](./journey-state-machine.md) - Journey states and transitions
 - [architecture-roadmap.md](./architecture-roadmap.md) - Long-term architecture plan
 - [unix-architecture-plan.md](./unix-architecture-plan.md) - UNIX-style architecture notes
+- [adr/README.md](./adr/README.md) - Architecture decision records
 
 ## Protocols and Formats
 

diff --git a/docs/adr/0001-execution-controller.md b/docs/adr/0001-execution-controller.md
@@ -0,0 +1,28 @@
+# ADR 0001: Extract Execution Controller
+
+Date: 2026-02-05
+Status: Accepted
+
+## Context
+
+The FLY phase mixed UI, orchestration, execution, and state transitions inside
+`src/waypoints/tui/screens/fly.py`. This coupling made the execution flow harder
+to test, reason about, and evolve. A dedicated orchestration boundary was
+needed to align with the “bicycle” philosophy and centralize execution logic.
+
+## Decision
+
+Introduce `ExecutionController` in `src/waypoints/orchestration/` to own:
+- Execution state transitions
+- Waypoint selection and sequencing
+- Result handling and intervention flow
+
+Move `ExecutionState` into `src/waypoints/fly/state.py` to make it a shared
+execution concept rather than a UI-local enum.
+
+## Consequences
+
+- FLY screen becomes thinner and more focused on UI concerns.
+- Execution logic is testable in isolation with unit tests.
+- Additional orchestration features (rollback, richer reports) have a clear
+  home without bloating the UI layer.
diff --git a/docs/adr/0002-flight-test-harness.md b/docs/adr/0002-flight-test-harness.md
@@ -0,0 +1,28 @@
+# ADR 0002: Flight Test Harness
+
+Date: 2026-02-05
+Status: Accepted
+
+## Context
+
+The testing strategy defined flight tests (L0–L5) but lacked operational tooling.
+To improve iteration discipline, we needed a repeatable harness that records
+results and validates generated projects against minimal expectations.
+
+## Decision
+
+Add `scripts/run_flight_test.py` to execute a flight test against an existing
+project directory. The runner:
+- Creates timestamped results directories
+- Validates minimum expected files
+- Runs optional smoke tests
+- Writes a `meta.json` summary
+
+Seed L0–L2 fixtures under `flight-tests/` to make the harness immediately usable.
+
+## Consequences
+
+- Provides a repeatable baseline for flight test validation.
+- Creates an audit trail for regressions and improvements.
+- Keeps generation concerns decoupled from validation so the harness is usable
+  before full automation is in place.
diff --git a/docs/adr/0003-execution-report.md b/docs/adr/0003-execution-report.md
@@ -0,0 +1,21 @@
+# ADR 0003: Execution Report Model
+
+Date: 2026-02-05
+Status: Accepted
+
+## Context
+
+Execution outcomes were logged but lacked a structured report for summarizing
+waypoint attempts. This made it hard to aggregate metrics or build future
+observability features on top of execution artifacts.
+
+## Decision
+
+Introduce `ExecutionReport` as a structured summary of a waypoint execution
+attempt, capturing result, timestamps, and completion data.
+
+## Consequences
+
+- Establishes a durable schema for execution summaries.
+- Enables future aggregation and reporting without parsing logs.
+- Keeps the report model independent of UI layers.
diff --git a/docs/adr/README.md b/docs/adr/README.md
@@ -0,0 +1,9 @@
+# Architecture Decision Records
+
+This directory captures the key architectural decisions for Waypoints.
+
+## Index
+
+- [ADR 0001: Extract Execution Controller](./0001-execution-controller.md)
+- [ADR 0002: Flight Test Harness](./0002-flight-test-harness.md)
+- [ADR 0003: Execution Report Model](./0003-execution-report.md)
diff --git a/docs/analysis/fly-callgraph.md b/docs/analysis/fly-callgraph.md
@@ -0,0 +1,95 @@
+# FLY Call Graph (2026-02-05)
+
+This document maps the current FLY execution flow from UI actions down to orchestration and execution. The goal is to identify extraction boundaries for a dedicated execution controller.
+
+## Entry Points (User Actions)
+
+- `FlyScreen.on_mount()`
+  - `coordinator.reset_stale_in_progress()`
+  - `_refresh_waypoint_list()`
+  - `_select_next_waypoint(include_in_progress=True)`
+  - `_update_git_status()` + timer
+  - `_update_project_metrics()`
+
+- `action_start()`
+  - Handles retry of selected failed waypoint
+  - Handles resume from `PAUSED`
+  - Handles start from `READY` or after `CHART_REVIEW` / `LAND_REVIEW`
+  - Transitions via `coordinator.transition(...)`
+  - Sets `execution_state = RUNNING`
+  - `_execute_current_waypoint()`
+
+- `action_pause()`
+  - Sets `execution_state = PAUSE_PENDING`
+  - Cancels executor if running (logs pause)
+
+- `action_skip()`
+  - Marks current waypoint skipped (via selection change)
+  - `_select_next_waypoint()`
+
+- `action_back()`
+  - Transitions `FLY_* -> CHART_REVIEW`
+  - Switches phase to `chart`
+
+- `action_forward()`
+  - Validates `LAND_REVIEW` availability
+  - `coordinator.transition(LAND_REVIEW)` + `_switch_to_land_screen()`
+
+- Intervention flow
+  - `_handle_intervention(...)` → `InterventionModal` → `_on_intervention_result(...)`
+
+## Execution Flow
+
+- `_execute_current_waypoint()`
+  - Marks waypoint `IN_PROGRESS` + saves flight plan
+  - Builds `WaypointExecutor` with callbacks and limits
+  - `run_worker(self._run_executor())`
+
+- `_run_executor()`
+  - `WaypointExecutor.execute()` → returns `ExecutionResult`
+
+- `on_worker_state_changed()`
+  - Handles `InterventionNeededError` or other failures
+  - Calls `_handle_execution_result(result)`
+
+- `_handle_execution_result(result)`
+  - SUCCESS
+    - Mark COMPLETE + save
+    - Commit via git (receipt validation)
+    - Parent epic check
+    - Select next waypoint
+    - If all complete: transition `LAND_REVIEW`
+  - INTERVENTION_NEEDED / MAX_ITERATIONS / FAILED
+    - Mark FAILED
+    - Transition `FLY_INTERVENTION`
+  - CANCELLED
+    - Transition `FLY_PAUSED`
+
+## Cross-Cutting Services
+
+- `JourneyCoordinator`
+  - Transition validation and persistence
+  - Waypoint selection and completion checks
+
+- `WaypointExecutor`
+  - Iterative execution loop
+  - Calls progress callback with `ExecutionContext`
+
+- `ExecutionLogReader` / `ExecutionLogWriter`
+  - Audit trail for each waypoint
+
+- `GitService` + `ReceiptValidator`
+  - Receipt validation
+  - Commit/tag integration
+
+---
+
+## Extraction Boundary (Target)
+
+Introduce `ExecutionController` to own the flow currently distributed across `FlyScreen`:
+- `start / pause / resume / skip / retry`
+- State transitions
+- Selection logic + execution sequencing
+- Handling of `ExecutionResult`
+
+`FlyScreen` should become a thin UI layer: inputs, rendering, and modal handling.
diff --git a/docs/analysis/fly-invariants.md b/docs/analysis/fly-invariants.md
@@ -0,0 +1,45 @@
+# FLY Invariants (2026-02-05)
+
+These invariants define expected behavior in FLY execution. They should be preserved during refactor and enforced through tests.
+
+## State and Transition Invariants
+
+- `JourneyCoordinator.transition(...)` is the single source of truth for journey state transitions.
+- `ExecutionState` is a UI execution mode, but must be consistent with `JourneyState`:
+  - `ExecutionState.RUNNING` implies `JourneyState.FLY_EXECUTING`.
+  - `ExecutionState.PAUSED` implies `JourneyState.FLY_PAUSED`.
+  - `ExecutionState.INTERVENTION` implies `JourneyState.FLY_INTERVENTION`.
+  - `ExecutionState.DONE` implies all waypoints complete and `JourneyState.LAND_REVIEW` is reachable.
+- Non-recoverable states should not be persisted as resume checkpoints.
+
+## Waypoint Status Invariants
+
+- When execution starts, current waypoint becomes `IN_PROGRESS`.
+- On success, waypoint must be marked `COMPLETE`, persisted, and logged.
+- On intervention or failure, waypoint must be marked `FAILED` (or `SKIPPED` for explicit skips).
+- Parent epic completion is checked after a child completes, but epics are not auto-completed.
+
+## Selection Invariants
+
+- Selection prefers resumable waypoints (`IN_PROGRESS`, `FAILED`) when resuming.
+- Selection should not allow a waypoint whose dependencies are incomplete.
+- Epics become eligible only when all children complete.
+
+## Execution Invariants
+
+- Execution uses `WaypointExecutor` exclusively.
+- UI must remain responsive (execution runs in background worker).
+- Progress updates are handled on main thread via `call_later`.
+- `ExecutionResult` drives state transitions; no silent fall-through.
+
+## Logging and Metrics Invariants
+
+- Each waypoint execution produces an execution log.
+- Cost and token metrics are updated after each waypoint.
+- Receipt validation must occur before auto-commit.
+
+## Recovery Invariants
+
+- Stale `IN_PROGRESS` waypoints are reset to `PENDING` on screen mount.
+- Intervention must surface a modal with explicit user action choices.
+- Rollback is best-effort and must not corrupt the flight plan state.