-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Description
Objective
Stress test the full localization path on a realistic scenario: discover with a frontier/default provider, export data, train a tiny local artifact, route into the local provider, and verify fallback behavior when the local path is weak.
Scope
This issue is the first execution target in the real-world escalation ladder. It should prove the end-to-end frontier-to-local path under realistic persisted state, not just isolated unit tests.
What To Exercise
- Run a short but real discovery loop on a stable scenario such as
grid_ctf - Persist generation outputs, scores, and knowledge artifacts
- Export strategy-level training data from the real run database
- Invoke the real training loop or CLI to produce a loadable MLX bundle
- Route competitor execution into the local provider when harness coverage is strong
- Verify explicit fallback to the primary/frontier provider when the local bundle is absent, disabled, or below threshold
- Confirm persisted history, reports, and artifact metadata reflect the real execution path
Implementation Guidance
- Prefer one high-value end-to-end test plus a small number of targeted support tests
- Use the real export and training runner path where feasible; avoid mocking away the subprocess/training boundary unless the test would become flaky or too slow
- Validate the routed execution path through orchestrator or pipeline integration rather than instantiating the local provider directly
- Keep the scenario deterministic and the time budget short enough for CI
- Capture which provider actually executed the competitor step so the handoff is observable in assertions
Produced Artifacts
- A persisted run database with at least one completed discovery run
- Exported training data bundle
- Loadable MLX artifact bundle
- Test assertions or logs proving local routing occurred
- Test assertions or logs proving frontier fallback occurred when expected
Parallelism And Dependencies
- This should run first
AC-217can begin in parallel once the exact scenario fixture and runtime harness assertions are settled, but this issue should merge first because it establishes the real localization pathAC-218should wait until this issue lands, because package portability is more meaningful once the exported and trained artifacts are known-good
Review Focus
- Did the test hit the real export/train/routing path, or was a critical boundary mocked out?
- Does the routed execution actually choose local at least once?
- Is fallback explicit, safe, and observable?
- Do persisted artifacts and history match the executed path?
Success Criteria
- Export completes with usable data
- Training produces a loadable local bundle
- Routed execution actually selects the local provider for the target role
- Fallback to the primary provider is explicit and safe
- Match outcomes and persisted history remain coherent across the handoff
Acceptance
- Discovery -> export -> train -> local load path completes
- Routed run uses local provider at least once
- Fallback path works when local artifact is unavailable
- Persisted artifacts and reports reflect the real execution path
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels