diff --git a/astro.config.ts b/astro.config.ts index 6b5caa2..8d77558 100644 --- a/astro.config.ts +++ b/astro.config.ts @@ -63,6 +63,7 @@ export default defineConfig({ items: [ { slug: "expanding-horizons/threads-context-and-caching" }, { slug: "expanding-horizons/model-pricing" }, + { slug: "expanding-horizons/high-level-harnesses" }, { slug: "expanding-horizons/what-to-read-next" }, ], }, diff --git a/src/content/docs/expanding-horizons/high-level-harnesses.mdx b/src/content/docs/expanding-horizons/high-level-harnesses.mdx new file mode 100644 index 0000000..83d8509 --- /dev/null +++ b/src/content/docs/expanding-horizons/high-level-harnesses.mdx @@ -0,0 +1,138 @@ +--- +title: High-level harnesses +description: Beyond individual agent sessions — scheduled automations, parallel agent fleets, and the emerging pattern of AI-driven code pipelines. +--- + +import ExternalLink from "../../../components/ExternalLink.astro"; + +The [harness engineering](/becoming-productive/harness-engineering/) chapter covered shaping a single agent's actions through AGENTS.md, skills, hooks, and subagents. +This page is one level of abstraction up — it covers tools and patterns that treat agents as a manageable workforce. + +:::caution +Products and feature sets can change significantly between revisions of this guide. +Treat this page as an orientation, especially for building a solid intuition of the field, not a definitive reference. +::: + +## From engineering to managing + +So far in this guide, you have been an **engineer** — you worked interactively with a single agent, steering it turn by turn in real time. +Now, you will become a **manager**, delegating work to a fleet of agents running in parallel. +Instead of supervising each agent individually, you will manage the output queue — a review inbox, an issue tracker, a PR pipeline. +Your coding assistant no longer serves as a conductor, but as an orchestrator. + +:::note[Remember] +The key shift is from "what should the agent do?" to "what work should be running right now, and how do I review what came back?" +::: + +## Running agents in parallel + +The key difference is running several agents simultaneously, each on an isolated task. +You hand different issues to separate agents at once, come back and review, and merge the ones you like. +That is qualitatively different from the sequential, one-task-at-a-time conductor workflow from the previous chapters. + +[Subagents](/becoming-productive/harness-engineering/#subagents) are also parallel, but they are different: a subagent is spawned **by the agent** to partition a single task's context. +The agent decides when to spawn one, waits for the result, and folds it back into its own session. +You as the human still trigger one top-level session and review one result. + +What is described here is different: **you** spawn multiple fully independent agent sessions, each assigned to a separate task. +No session knows about the others. +You are not waiting on any one of them — you come back later and review the queue of results in bulk. + +In practice, each agent needs its own isolated workspace — typically a separate Git worktree — so their changes do not interfere. +A dashboard or queue then surfaces results as agents finish, letting you review and merge at your own pace. + +For example, Conductor is a tool built around this model, +running multiple AI coding agents (Claude Code and Codex) in parallel worktrees with a shared review dashboard. + +## Scheduled and recurring agents + +Agents do not always need to wait for you to trigger them — you can set them up in advance to run on a schedule. +The pattern is similar to a cron job or a CI pipeline: describe a recurring task, define when it should run, and have an agent execute it in the background. +Results land in a review inbox or are auto-archived if nothing needs attention. + +This is well-suited for tasks like: +- Daily issue triage +- Surfacing and summarizing CI failures +- Generating release briefs +- Checking for regressions between versions + +With scheduled agents, the process becomes closer to a CI pipeline than a chat window — an agent is no longer a tool you reach for, but a background process. + +As an example, OpenAI's Codex App includes an Automations feature built around exactly this model. + +## Issue-tracker-driven orchestration + +A natural extension of scheduled agents is wiring them directly to your issue tracker. +Instead of manually assigning tasks to agents, the system monitors a board and automatically spawns an agent for each new issue in scope. +Engineers decide what issues belong in scope; the orchestrator handles assignment and execution. + +Agent behavior can be defined in a workflow file versioned alongside the code — the same way you version a CI pipeline. +When an agent finishes, it gathers evidence (CI results, PR review feedback, complexity analysis) for human review. + +For example, Symphony is an open-source orchestration service published by OpenAI that implements this pattern, +monitoring a Linear board and running a Codex agent per issue in an isolated workspace. + +:::tip +Issue-tracker-driven orchestration works best on codebases that have adopted [harness engineering](/becoming-productive/harness-engineering/). +::: + +## Agent communication + +Running multiple agents in parallel may cause coordination problem — agents must exchange information without overloading any one context window. +Two broad patterns have emerged. + +The simpler one is **hub-and-spoke orchestration**, where a lead agent spawns workers, collects their outputs, and consolidates. +Workers never communicate directly. +The benefit is simplicity, as the full picture is present in one place. +The cost is that every intermediate result, log line, and failed attempt flows back through the orchestrator's context, degrading its reasoning quality over time. + +The more capable pattern is **collaborative teaming**, where agents share a task list, claim work independently, and can send messages directly to one another. +A worker can flag a dependency, request a peer review, or broadcast a finding without routing it through the lead. +The lead's context stays clean; coordination happens at the edges. + +In practice, most pipelines fall somewhere on a spectrum between these extremes, often organized into three levels: + +1. **Isolated workers** — each agent runs independently and returns its output to the caller. +2. **Orchestrated workflows** — outputs become inputs for the next stage via shared files or aggregated results. +3. **Collaborative teams** — agents share a task graph, can send direct or broadcast messages, and notify the lead when work completes. + +The right level depends on how tightly coupled the tasks are. +Independent parallel tasks — security scan, test run, lint check — fit level 1 or 2 well. +Tasks that need to challenge or build on each other's intermediate findings call for level 3. + +For reference, Claude Code Agent Teams implements level 3 with a shared task list, file-locked claiming, mailboxes for direct and broadcast messages, and idle notifications back to the lead. +Codex 0.117 introduced path-based agent addressing and structured inter-agent messaging for its multi-agent workflows. + +## The Code Factory pattern + +Beyond specific products, there is an emerging pattern popularized by Ryan Carson under the name **Code Factory**. +The idea is a repository setup where agents autonomously write code, open pull requests, and a separate review agent validates those PRs with machine-verifiable evidence. +If validation passes, the PR merges without human intervention. + +The continuous loop looks like this: + +1. Agent writes code and opens a PR. +2. Risk-aware CI gates check the change. +3. A review agent inspects the PR and collects evidence — screenshots, test results, static analysis. +4. If all checks pass, the PR lands automatically. +5. If anything fails, the agent retries or flags the issue for human review. + +:::caution +A Code Factory is only as good as its quality gates. +An automated pipeline that merges bad PRs is strictly worse than one that does nothing. +Invest in solid tests, linters, and CI before automating the merge step. +::: + +- + +## The one-human company + +The Code Factory pattern is the technical foundation of a broader idea: that a single person with a well-configured agent fleet can operate at the scale that would previously have required a full engineering team. + +This requires connecting agents to communication platforms, scheduling systems, and external services — turning a single machine into an always-on runtime that responds to messages, executes tasks, and ships work continuously. +As an example of tooling in this space, see OpenClaw, which packages infrastructure for exactly this kind of setup. + +Steve Yegge, in a widely-read interview with The Pragmatic Engineer, argues that the engineering profession is reorganizing around exactly this spectrum. +His framing: most engineers are at the low end of AI adoption today, and those who stay there risk being outcompeted by engineers who learn to orchestrate agent fleets — to act as owners of work queues rather than writers of individual functions. + +- diff --git a/src/data/links.csv b/src/data/links.csv index ca4f650..ecea782 100644 --- a/src/data/links.csv +++ b/src/data/links.csv @@ -25,6 +25,7 @@ https://code.claude.com/docs/en/security,Security - Claude Code Docs,Anthropic,, https://code.claude.com/docs/en/sub-agents,Create custom subagents - Claude Code Docs,Anthropic,,2026-03-13 https://code.claude.com/docs/en/sub-agents#code-reviewer,Create custom subagents - Claude Code Docs,,,2026-03-05 https://coderabbit.ai/,CodeRabbit,,,2026-03-05 +https://conductor.build/,Conductor - Run a team of coding agents on your Mac,Melty Labs,,2026-03-25 https://context7.com/,Context7 - Up-to-date documentation for LLMs and AI code editors,,,2026-03-13 https://cursor.com/blog,Cursor Blog,,,2026-03-04 https://cursor.com/bugbot,Cursor Bugbot,,,2026-03-05 @@ -38,6 +39,8 @@ https://cursor.com/for/code-review,Reviewing Code with Cursor | Cursor Docs,,,20 https://cursor.com/pricing,Cursor Subscription,,,2026-03-04 https://developers.openai.com/api/docs/guides/compaction,Compaction,OpenAI,,2026-03-04 https://developers.openai.com/codex/agent-approvals-security,Codex: Agent approvals & security,OpenAI,,2026-03-16 +https://developers.openai.com/codex/app,App – Codex | OpenAI Developers,,,2026-03-25 +https://developers.openai.com/codex/app/automations,Automations – Codex app | OpenAI Developers,,,2026-03-25 https://developers.openai.com/codex/app/worktrees/#working-between-local-and-worktree,Worktrees,,,2026-03-10 https://developers.openai.com/codex/cli/features#run-local-code-review,Codex CLI features (run local code review),,,2026-03-05 https://developers.openai.com/codex/integrations/github/,Use Codex in GitHub,,,2026-03-05 @@ -56,6 +59,7 @@ https://github.com/mcp,GitHub MCP Registry,,,2026-03-13 https://github.com/microsoft/playwright-mcp,microsoft/playwright-mcp,Microsoft,,2026-03-13 https://github.com/mkaput,Marek Kaput,,,2026-03-04 https://github.com/openai/skills,openai/skills,OpenAI,,2026-03-12 +https://github.com/openai/symphony,"GitHub - openai/symphony: Symphony turns project work into isolated, autonomous implementation runs, allowing teams to manage work instead of supervising coding agents. · GitHub",,,2026-03-25 https://github.com/software-mansion-labs/skills,software-mansion-labs/skills,Software Mansion,,2026-03-12 https://github.com/steipete/mcporter/,"steipete/mcporter: Call MCPs via TypeScript, masquerading as simple TypeScript API. Or package them as cli.",Peter Steinberger,,2026-03-04 https://github.com/topics/agent-skills,GitHub Topic: agent-skills,,,2026-03-12 @@ -73,9 +77,11 @@ https://lucumr.pocoo.org/,Thoughts and Writings,Armin Ronacher,,2026-03-04 https://mcp.grep.app/,mcp.grep.app,Vercel,,2026-03-04 https://mitchellh.com/,Blog,Mitchell Hashimoto,,2026-03-04 https://models.dev/,Models.dev - An open-source database of AI models,Opencode,,2026-03-04 +https://newsletter.pragmaticengineer.com/p/from-ides-to-ai-agents-with-steve,From IDEs to AI Agents with Steve Yegge - by Gergely Orosz,,,2026-03-25 https://openai.com/chatgpt/pricing/,ChatGPT Subscription,,,2026-03-04 https://openai.com/index/harness-engineering/,Harness engineering: leveraging Codex in an agent-first world,OpenAI,2026-02-11,2026-03-04 https://openai.com/news/engineering/,OpenAI Engineering News,,,2026-03-04 +https://openclaw.ai/,OpenClaw — Personal AI Assistant,,,2026-04-02 https://opencode.ai/docs/go/,Opencode Go,,,2026-03-04 https://platform.claude.com/docs/en/build-with-claude/compaction,Compaction,Anthropic,,2026-03-04 https://platform.claude.com/docs/en/resources/prompt-library/socratic-sage,Prompting best practices,Anthropic,,2026-03-04 @@ -110,6 +116,7 @@ https://x.com/GeminiApp,Google Gemini (@GeminiApp) on X,,,2026-03-04 https://x.com/karpathy,Andrej Karpathy (@karpathy) on X,,,2026-03-04 https://x.com/opencode,OpenCode (@opencode) on X,,,2026-03-04 https://x.com/RLanceMartin,Lance Martin (@RLanceMartin) on X,,,2026-03-04 +https://x.com/ryancarson,Ryan Carson (@ryancarson) on X,,,2026-03-25 https://x.com/thorstenball,Thorsten Ball (@thorstenball) on X,,,2026-03-04 https://x.com/thsottiaux,Tibo (@thsottiaux) on X,,,2026-03-04 https://x.com/trq212,Thariq Shihipar (@trq212) on X,,,2026-03-04