Retrospective: Crosslink as autonomous agent swarm coordinator — ferrolearn case study

## Context

On March 4, 2026, crosslink was used as the coordination layer for an autonomous agent swarm that built [ferrolearn](https://github.com/dollspace-gay/ferrolearn) — a scikit-learn equivalent for Rust — from an empty repository. A single "Phase 0 Coordinator" agent orchestrated **33 subagents** (opus and sonnet models) across **four phases**, producing a 14-crate Cargo workspace with **1,452 passing tests** and zero failures. A fifth post-phase effort added PyO3 Python bindings passing **619/619 sklearn check_estimator** tests.

The session spanned ~13,500 transcript lines, 1,600+ tool calls, 6+ context window continuations, and roughly 4 hours of wall-clock time.

This retrospective documents what worked, what broke, and what crosslink needs to become a first-class swarm manager.

---

## What went right

### 1. Issue tracking as persistent memory across context compressions
The coordinator session hit context limits 6+ times and was auto-continued. Each time, crosslink issues + comments survived as the canonical state of truth. The coordinator could re-read `crosslink list` and `crosslink show <id>` after each compression to reconstruct what agents had completed and what remained. **This is crosslink's killer feature for swarms** — it's the only state that survives context window resets.

### 2. Typed comments (`--kind plan/decision/observation/result`) created an auditable build log
Every agent spawn was logged with `--kind plan`, every completion with `--kind result`. When debugging merge conflicts or verifying which agents had delivered, the comment trail was the definitive record. The typed taxonomy made it possible to distinguish "what we planned" from "what actually happened."

### 3. Design-driven development pattern validated
The `/design` skill produced 5 design documents (116 requirements, 62 acceptance criteria, 0 open questions) that served as direct agent prompts. Agents didn't need to make judgment calls — the design docs contained exact trait signatures, file paths, dependency versions, and acceptance criteria. Crosslink's knowledge repo stored the design docs so they persisted across sessions.

### 4. `crosslink quick` streamlined the create-label-work cycle
The coordinator created 10+ issues rapidly. `crosslink quick "title" -p high -l feature` collapsing create + label + session work into one command was essential for the fast pace of swarm management.

### 5. Phase gating worked flawlessly across all 4 phases
The coordinator ran `cargo build --workspace && cargo test --workspace` at phase boundaries and only proceeded when all tests passed:
- Phase 1: 230 tests
- Phase 2: 631 tests
- Phase 3: 1,054 tests
- Phase 4: 1,438 tests
- Post-cleanup: 1,452 tests

Zero failures at every gate. Crosslink issues served as the phase transition record.

### 6. Model cost optimization via opus/sonnet allocation
The coordinator deliberately assigned opus to architecturally complex agents (GBM, typed pipeline, backend trait, calibrated classifiers, manifold learning) and sonnet to more mechanical ones (scalers, imputers, additional clustering). This kept costs down while maintaining quality where it mattered.

### 7. Post-phase test audit caught real quality issues
After Phase 4, a dedicated test audit found: 7/10 sklearn fixture files orphaned (tests referenced wrong paths), many "does it not crash" shape-only tests, and zero cross-crate integration tests. Two cleanup agents (oracle tests + E2E integration) raised the test count from 1,438 to 1,452 while dramatically improving test quality.

---

## What went wrong

### 1. The coordinator repeatedly guessed wrong crosslink subcommands
Errors encountered in the transcript:
- `crosslink issues list` → should be `crosslink list`
- `crosslink new` → should be `crosslink create`
- `crosslink knowledge update` → should be `crosslink knowledge edit`
- `crosslink knowledge edit --from-doc` → `edit` doesn't accept `--from-doc` (only `add` does)
- `crosslink close --reason` → `--reason` doesn't exist

**Impact**: Each wrong guess cost a tool call round-trip (~2-3 seconds). Over 1,600+ tool calls, this adds up.

**Suggestion**: Consider adding common aliases (`new` → `create`, `issues` → `list`) and making the `knowledge edit` subcommand accept `--from-doc` for parity with `knowledge add`.

### 2. No native swarm/agent coordination primitives
Crosslink tracked issues for individual agents, but the coordinator had to manually:
- Map agent IDs to crosslink issue IDs
- Track which agents were running vs. completed
- Decide merge order
- Detect stuck agents (by polling `TaskOutput` repeatedly)

**Suggestion**: A `crosslink agent spawn <issue-id>` / `crosslink agent status` / `crosslink agent merge <id>` workflow would make swarm coordination a first-class concept rather than an emergent behavior built on top of issue tracking.

### 3. Worktree management was fragile
Several problems with agent worktrees:
- Embedded git repository warning when `git add -A` captured `.claude/worktrees/agent-*`
- Worktree branches showed as "not fully merged" because they weren't merged to `origin/main`, requiring `git branch -D` instead of `git branch -d`
- Agent 9 (tree) had its worktree cleaned up before the coordinator could merge it, losing the branch reference
- Some agents committed directly to `dev` while others worked in worktrees, creating an inconsistent merge story
- Worktree cleanup in Phases 3-4 required `--force` due to modified/untracked files left behind by agents

**Suggestion**: Crosslink should own the worktree lifecycle for agent work — create on spawn, auto-add to `.gitignore`, merge on completion, clean up after merge verification.

### 4. Hook policy friction during coordinator operation
The `work-check.py` hook blocked `git merge` because it was in `blocked_git_commands`. The user had said "let you claudes go sicko mode" but the config wasn't updated. The coordinator had to:
1. Diagnose the hook block
2. Ask the user to create a local override
3. Wait for the user to respond
4. Create `.crosslink/hook-config.local.json`

**Suggestion**: Consider a `crosslink mode coordinator` that temporarily relaxes git mutation restrictions for the current session, or allow the hook config to specify per-role permissions.

### 5. Context window exhaustion was the primary scaling bottleneck
The coordinator consumed 6+ full context windows across 4 phases. Each continuation required the system to generate a multi-page summary, and some details were lost in compression. The coordinator had to re-read design docs, re-check branch state, and re-learn the crosslink API after each reset.

**Root cause**: The coordinator's context was filled by:
- Agent prompts (long, detailed, one per agent)
- TaskOutput polling results (verbose JSON)
- `cargo build/test` output
- Git merge conflict resolution

**Suggestion**: Crosslink could provide a `crosslink swarm status` command that returns a compact summary of all active work, replacing the need for the coordinator to individually poll each agent and reconstruct state from raw tool output.

### 6. Cargo.lock merge conflicts were the most common merge failure
Every worktree agent that added dependencies produced a Cargo.lock conflict when merging to dev. The coordinator resolved these by running `cargo generate-lockfile` after each merge, but this was a recurring tax. In Phase 4, Cargo.lock had to be committed separately before the backend branch could merge cleanly.

**Suggestion**: This is inherent to Rust workspaces with parallel agents. A `crosslink merge <branch>` command could automate the "merge, detect Cargo.lock conflict, regenerate lockfile, commit" pattern.

### 7. ferrolearn-decomp was a repeated merge conflict hotspot
In both Phase 3 and Phase 4, agents working on decomposition-adjacent features (NMF/KernelPCA, LDA/FactorAnalysis) modified ferrolearn-decomp's `Cargo.toml` and `lib.rs`, causing merge conflicts. The coordinator resolved these manually each time, but this was predictable and avoidable.

**Lesson**: Crate ownership boundaries should be made explicit in design docs. If two agents must touch the same crate, one should be sequenced after the other rather than running in parallel.

### 8. ndarray feature flag issue caught agents off guard
Agent 23 (remaining preprocessors) hit `RelativeEq not implemented for ArrayBase` — ndarray requires the `approx` feature flag for approximate comparison traits. This wasn't specified in the design docs and had to be debugged at runtime.

**Lesson**: Feature flags and conditional compilation dependencies should be enumerated in design documents alongside the main dependency versions.

### 9. No close/changelog integration for agent-generated work
All agent issues were closed with generic `crosslink close <id>`, producing changelog entries like "Agent 2: ferrolearn-sparse" under "Changed." These are meaningless in a user-facing changelog.

**Suggestion**: Consider `crosslink close <id> --no-changelog` as the default for agent-internal work, or allow `--changelog-title "Add sparse matrix types"` to override the issue title in the changelog.

### 10. .gitignore hygiene required post-hoc cleanup
After Phase 4 completion, subcrate `.crosslink/` directories had been committed to the repo. A dedicated cleanup pass was needed to add `**/.crosslink/` to `.gitignore` and remove the tracked files.

**Suggestion**: `crosslink init` should add `.crosslink/` patterns to `.gitignore` automatically, including for workspace subcrates.

---

## Phase-by-phase breakdown

### Phase 1: Core infrastructure (8 agents)
- **Agents 1-8**: core types, sparse matrices, metrics, preprocessing, linear models, model selection, fixtures, CI setup
- **Model mix**: 5 opus, 3 sonnet
- **Gate**: 230 tests, 0 failures
- **Key issues**: Worktree `git add -A` captured nested repos; agent 9 worktree cleaned before merge

### Phase 2: Algorithm crates (9 agents)
- **Agents 9-17**: decision trees, k-neighbors, naive Bayes, clustering, decomposition, datasets, I/O, extended metrics, ensemble foundations
- **Model mix**: 4 opus, 5 sonnet
- **Gate**: 631 tests, 0 failures
- **Key issues**: Cargo.lock conflicts on every merge; some agents committed to `dev` directly

### Phase 3: Advanced algorithms (8 agents)
- **Agents 18-25**: GBM/AdaBoost, GMM/Agglomerative, NMF/KernelPCA, Imputers/Selection, remaining preprocessors, backend trait, model-selection/datasets additions, typed pipeline
- **Model mix**: 4 opus, 4 sonnet
- **Gate**: 1,054 tests, 0 failures
- **Key issues**: ferrolearn-decomp merge conflict (two agents modified same crate); ndarray `approx` feature flag; worktree cleanup required `--force`

### Phase 4: Remaining algorithms (8 agents)
- **Agents 27-34**: PartialFit/SGD, ColumnTransformer, ElasticNet/BayesianRidge/Huber, additional clustering (MeanShift/Spectral/OPTICS), CalibratedClassifierCV/SelfTraining, manifold learning (Isomap/MDS/SpectralEmbedding/LLE), MiniBatchKMeans/IncrementalPCA, LDA/FactorAnalysis/FastICA
- **Model mix**: 4 opus, 4 sonnet
- **Gate**: 1,438 tests, 0 failures
- **Key issues**: Cargo.lock needed separate commit before backend merge; ferrolearn-decomp conflict again (two agents modified it in parallel again)

### Post-Phase 4: Test audit & cleanup
- **Test audit findings**: 7/10 sklearn fixture files orphaned, many shape-only "does it not crash" tests, zero cross-crate integration tests, zero E2E tests
- **Two cleanup agents**: oracle test writer (compared Rust output against sklearn fixture values) + E2E integration test writer (cross-crate pipelines)
- **Fixture extension**: `generate_fixtures.py` expanded to cover all algorithms (RandomForest, KMeans, PCA, GMM, etc.)
- **Final gate**: 1,452 tests, 0 failures
- **PR**: [ferrolearn#1](https://github.com/dollspace-gay/ferrolearn/pull/1) — 175 files, ~65,000 lines, 50 commits

### Post-Phase 4: PyO3 Python bindings
- **Single coordinator session** (no subagents): Built `ferrolearn-python` crate with PyO3 bindings for all 12 core models
- **Python sklearn wrappers**: Inherit from sklearn `BaseEstimator`, `RegressorMixin`, `ClassifierMixin`, etc.
- **check_estimator**: 619/619 passed, 0 failed (after 4 rounds of fixes: numpy type coercion, pickle support, classification target validation, error message formatting, n_iter_ attribute)
- **cross_val_score**: 9/9 passed
- **Key issues**: ndarray version mismatch (numpy 0.24 needs ndarray 0.16, workspace uses 0.17 — fixed by upgrading to pyo3 0.28 + numpy 0.28); `_validate_data` removed in sklearn 1.8
- **PR**: [ferrolearn#6](https://github.com/dollspace-gay/ferrolearn/pull/6) — 26 files, +2,561 lines

---

## By the numbers

| Metric | Phase 1 | Phase 2 | Phase 3 | Phase 4 | Post-Phase | Total |
|--------|---------|---------|---------|---------|------------|-------|
| Agents spawned | 8 | 9 | 8 | 8 | 2+1 | 33+1 |
| Tests passing | 230 | 631 | 1,054 | 1,438 | 1,452 | 1,452 |
| Context windows consumed | 2 | 2 | 1 | 1 | 1+ | 6+ |
| Merge conflicts resolved | 2 | 3 | 2 | 2 | 0 | 9 |
| Wrong crosslink commands | 3 | 2 | ~2 | ~1 | 0 | ~8 |
| Crates implemented | 7 | 7 | — | — | 1 | 15 |

**Final deliverables:**
- 14 Rust crates + 1 Python bindings crate (15 total)
- 175 files, ~65,000 lines of Rust code
- 1,452 Rust tests + 628 Python check_estimator/cross_val_score tests
- 50 commits, 33 subagents across 4 phases
- 2 PRs: [#1](https://github.com/dollspace-gay/ferrolearn/pull/1) (core) + [#6](https://github.com/dollspace-gay/ferrolearn/pull/6) (Python bindings)

## Recommendations for crosslink v0.3+

1. **`crosslink swarm` subcommand group** — first-class agent lifecycle management (spawn, status, merge, abort)
2. **`crosslink knowledge edit --from-doc`** — parity with `add`
3. **Common command aliases** — `new` → `create`, `issues` → `list`
4. **Coordinator mode** — session-scoped relaxation of git mutation hooks
5. **Compact status output** — `crosslink swarm status` returning a table, not requiring N individual queries
6. **Worktree lifecycle ownership** — crosslink manages the full create → gitignore → merge → cleanup cycle
7. **Changelog-aware close** — `--no-changelog` default for agent work, or `--changelog-title` override
8. **Lock auto-release on stale sessions** — the stale locks visible in the session context should auto-release after session end
9. **Crate ownership annotation in design docs** — when two agents share a crate, crosslink should warn or enforce sequencing
10. **Auto-gitignore for `.crosslink/`** — `crosslink init` should handle workspace subcrates, not just the root

---

## Conclusion

Crosslink worked remarkably well as an *emergent* swarm coordinator — the issue tracker, comment system, and knowledge repo provided just enough persistent state to keep a multi-agent build on track across 6+ context window resets and 33 subagents. But the friction points (wrong commands, worktree management, hook conflicts, repeated merge conflicts, no native agent primitives) show that swarm coordination is a natural next step for the tool, not just an incidental use case.

The ferrolearn build proved that design-driven development + crosslink issue tracking + Claude Code agent spawning can produce a substantial, tested codebase (15 crates, 1,452+ tests, ~65,000 lines, plus fully sklearn-compatible Python bindings) from an empty repo in under 4 hours. The bottleneck was not the agents or the code quality — it was the coordination overhead. With first-class swarm primitives, that overhead could be cut in half.

Metric	Phase 1	Phase 2	Phase 3	Phase 4	Post-Phase	Total
Agents spawned	8	9	8	8	2+1	33+1
Tests passing	230	631	1,054	1,438	1,452	1,452
Context windows consumed	2	2	1	1	1+	6+
Merge conflicts resolved	2	3	2	2	0	9
Wrong crosslink commands	3	2	~2	~1	0	~8
Crates implemented	7	7	—	—	1	15

Retrospective: Crosslink as autonomous agent swarm coordinator — ferrolearn case study #231

Description

Context

What went right

1. Issue tracking as persistent memory across context compressions

2. Typed comments (--kind plan/decision/observation/result) created an auditable build log

3. Design-driven development pattern validated

4. crosslink quick streamlined the create-label-work cycle

5. Phase gating worked flawlessly across all 4 phases

6. Model cost optimization via opus/sonnet allocation

7. Post-phase test audit caught real quality issues

What went wrong

1. The coordinator repeatedly guessed wrong crosslink subcommands

2. No native swarm/agent coordination primitives

3. Worktree management was fragile

4. Hook policy friction during coordinator operation

5. Context window exhaustion was the primary scaling bottleneck

6. Cargo.lock merge conflicts were the most common merge failure

7. ferrolearn-decomp was a repeated merge conflict hotspot

8. ndarray feature flag issue caught agents off guard

9. No close/changelog integration for agent-generated work

10. .gitignore hygiene required post-hoc cleanup

Phase-by-phase breakdown

Phase 1: Core infrastructure (8 agents)

Phase 2: Algorithm crates (9 agents)

Phase 3: Advanced algorithms (8 agents)

Phase 4: Remaining algorithms (8 agents)

Post-Phase 4: Test audit & cleanup

Post-Phase 4: PyO3 Python bindings

By the numbers

Recommendations for crosslink v0.3+

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2. Typed comments (`--kind plan/decision/observation/result`) created an auditable build log

4. `crosslink quick` streamlined the create-label-work cycle