This is a practical, opinionated guide for using Flow as the control plane for software delivery, optimized for Claude Code and Codex.
The goal is simple: tighter feedback loops, fewer regressions, less context loss, and consistent quality gates.
Do not treat Flow as just a task runner. Use it as the enforced loop:
- Start with project context and reusable skills.
- Implement in Claude/Codex with task-native commands.
- Run the smallest meaningful tests first.
- Capture traces/logs when behavior is unclear.
- Commit through
f commitwith quality/testing/skill gates. - Ship through Flow tasks, not ad hoc commands.
If you do this consistently, team behavior becomes predictable and AI sessions become reliable.
Run these first:
f doctor
f auth login
f latestWhat this gives you:
- verified shell and toolchain integration
- authenticated Flow AI and storage access
- latest Flow binary with current command behavior
If you use fish integration heavily:
f shell-initFrom the repository root:
f info
f tasks list
f setupIf project is not Flow-managed yet:
f initThen immediately add these foundations to flow.toml:
[skills]and[skills.codex][commit.testing][commit.quality][commit.skill_gate]- core tasks (
test,test-related, build, dev, deploy/ship)
Use this as a starting profile and adjust per repo:
version = 1
[project]
name = "your-project"
[skills]
sync_tasks = true
install = ["quality-feature-delivery"]
[skills.codex]
generate_openai_yaml = true
force_reload_after_sync = true
task_skill_allow_implicit_invocation = false
[[tasks]]
name = "test"
command = "<your test command>"
description = "Run project tests"
[[tasks]]
name = "test-related"
command = "<script that runs likely related tests>"
description = "Run smallest related tests for changed files"
[commit]
review_instructions_file = ".ai/commit-review-instructions.md"
[commit.testing]
mode = "block" # off | warn | block
runner = "bun" # Bun-first local gate
require_related_tests = true
ai_scratch_test_dir = ".ai/test" # optional gitignored AI scratch tests
run_ai_scratch_tests = true # run scratch tests when no related tracked tests
allow_ai_scratch_to_satisfy_gate = false
max_local_gate_seconds = 30
[commit.quality]
mode = "block"
require_docs = true
require_tests = true
auto_generate_docs = true
doc_level = "basic"
[commit.skill_gate]
mode = "block"
required = ["quality-feature-delivery"]Why this matters:
sync_tasks+ Codex skill generation makes tasks visible as skills.- blocked commit gates make quality non-optional.
- related-test enforcement keeps the loop fast and relevant.
cd <repo>
f tasks listThen choose one clear objective and one validation command before coding.
Prefer:
f dev
f test-related
f logs <task>Avoid direct, inconsistent commands when equivalent Flow tasks exist.
Your prompt should include:
- objective
- files or subsystem boundaries
- required tests
- expected output shape
- “commit through
f commitwithout skip flags”
Example prompt frame:
Implement X in Y files.
Run f test-related-main first, then broader tests if needed.
Update .ai/features for user-visible changes.
Commit using f commit with no skip flags.
Order of validation:
- related tests (
f test-related/ branch-based variant) - subsystem suite
- full suite only if risk justifies
f commitThis centralizes:
- AI review
- test/doc quality checks
- feature documentation updates
- sync/audit metadata
Do not bypass with --skip-quality or --skip-tests unless explicitly intentional.
Treat .ai/features/*.md as the source of truth for what exists.
Each user-visible feature should map to:
- purpose/description
- source files
- test files
- coverage status
- last verified commit
Why this is high leverage:
- new AI sessions start with real project capabilities
- stale feature docs are detectable at commit time
- dashboard/reporting can track drift and coverage
Use local skills for repo-specific “how we build here”.
Recommended minimum skill set:
- quality feature delivery (tests + docs + commit gates)
- environment/secret usage (
f envonly) - release/ship protocol
- tracing/diagnostics protocol
Then enforce with:
[commit.skill_gate]
mode = "block"
required = ["quality-feature-delivery"]This is how you convert good intentions into default behavior.
- lane A: very fast related tests for development iterations
- lane B: broader suite for pre-ship confidence
Use a script (like .ai/scripts/test-related.ts) that:
- maps changed source files to candidate tests
- supports
--base origin/main --head HEAD - can list commands without running (
--list) - runs the smallest useful subset first
If your runner fails due environment prerequisites (toolchain/vendor issues), add a preflight task:
f <runner>-ready- optional auto-repair task
f <runner>-fix
This avoids burning minutes before obvious infra failures.
When behavior is unclear, switch from “guess and patch” to “observe and patch”:
- run the target task via Flow
- inspect
f logs <task> - collect traces (
f trace/ project-specific trace tasks) - summarize signal before changing code
The best pattern is “capture once, reason once, patch once.”
Use f env as the single path for secrets and runtime env management:
f env setup
f env set KEY=value
f env pull
f env run <command>Avoid ad hoc .env drift across machines.
For deployment or mobile shipping flows, define one confidence task that runs before release:
- health checks
- trace ingestion checks
- critical smoke test
- related tests for release-impact files
Then make ship task depend on that confidence task.
Example:
f mobile-confidence -> f mobile-ship
Result: broken pipelines fail before expensive release steps.
When adding Flow to an existing repo, use this order:
- add
flow.tomlwith core tasks - add env management (
[storage]+f envflow) - add related-test task/script
- add commit testing + quality + skill gates in warn mode
- validate for 2-3 days
- flip to block mode
- add
.ai/featuresfor top capabilities
This avoids destabilizing the team while still moving to enforcement.
Implement <feature> in <scope>.
Use Flow tasks only (no ad hoc commands when task exists).
Run related tests first, then broaden if risk warrants.
Update .ai/features for user-visible behavior changes.
Commit with f commit (no skip flags).
Do not patch yet.
Collect logs/traces via Flow tasks and summarize likely root causes.
Propose smallest validating experiment.
After confirmation, implement fix + related tests + feature doc update.
Commit via f commit.
Refactor <module> without behavior changes.
Keep public API stable.
Run focused tests proving no regression.
Document any non-obvious migration risks.
Commit via f commit.
- Running direct commands repeatedly when Flow tasks exist.
- Treating tests as optional before
f commit. - Using skip flags routinely.
- Writing prompts without required validation commands.
- Keeping feature docs as manual, stale notes.
- Debugging by repeated blind edits instead of trace/log loop.
f latest(if Flow changed frequently)f tasks listf ai/f codex/f clauderesume context- confirm one objective + one validation command
- related tests pass
- feature docs updated (
.ai/features) - no quality gate bypass intended
f commit
- confidence task passes
- relevant traces/logs clean
- release task run through Flow
- tasks run through
f - basic env usage
- related test task
- shared review instructions
- reusable skills
- blocked testing/quality gates
- blocked skill gate
.ai/featuresas living capability map
- preflight + confidence tasks
- trace-first debugging
- structured release checks
Aim to reach Level 3 quickly, then Level 4 where release speed and reliability both improve.
Use these defaults unless you have a reason not to:
commit.testing.mode = "block"commit.quality.mode = "block"commit.skill_gate.mode = "block"skills.sync_tasks = trueskills.codex.generate_openai_yaml = trueskills.codex.force_reload_after_sync = true- branch-diff related tests (
--base origin/main --head HEAD)
This gives the highest consistency with the least manual memory burden.
Flow works best when it is the enforced operating system for development, not an optional helper.
If you route implementation, testing, docs, commit review, and shipping through Flow, you get:
- faster iteration
- lower regression rates
- shared project memory for humans and AI
- auditable delivery quality
That is the path to writing software better, repeatedly.