Use model haiku for most general-purpose crew#19
Use model haiku for most general-purpose crew#19LannyRipple wants to merge 1 commit intoharrymunro:mainfrom
Conversation
* SKILL.md: specify models and guidance for tasking haiku * SKILL.md: description changed to avoid tickling Claude's roleplay and social engineering guardrails. * crew-briefing.md: to avoid haiku misreading tasking as roleplay or social engineering Avoid issues observed after context compaction * SKILL.md: reinforce that the admiral does not implement * SKILL.md: explicit instructions to write captains-log.md before declaring mission completion.
|
Maybe this should be an add-on capability? Opened Issue #20 for discussion. |
harrymunro
left a comment
There was a problem hiding this comment.
Thanks @LannyRipple — this addresses real failure modes you've clearly hit in practice. The compaction-survival doctrine, captain's log gate, and idle ship cleanup are all strong additions. A few thoughts on the model selection piece and some conflicts with work landing in #21.
Accept as-is
Indentation fixes (Steps 4, 5, 6) — Genuine formatting bug, good catch.
Admiral never implements (Admiralty Doctrine) — The standing order admiral-at-the-helm.md already covers this, but the compaction-survival note is the real value: "this constraint survives context compaction — re-form captains if they're gone." That's a failure mode we've seen too.
Captain's log to disk + Mission Complete Gate — Writing the log to chat only and then declaring victory is a real anti-pattern. The gate enforcement is good.
Clean up idle ships — Reasonable operational addition to the checkpoint list.
Push back: model selection as a default
The headline change — prescribing haiku for most crew — is where we diverge. Nelson should default to the most powerful agents available. Haiku is significantly less capable and the cost saving isn't worth the reliability hit for most missions.
That said, the flexibility is valuable. What we'd prefer is a user-driven cost control mechanism rather than a hardcoded default:
- Default: All agents use the admiral's model (most powerful available).
- User trigger: When the user signals cost sensitivity in their sailing orders — something like "keep costs under control", "budget-conscious", or an explicit token budget constraint — the admiral downgrades appropriate agents to haiku. This could map naturally to ship classes: lighter vessels (patrol boats, corvettes) get haiku, heavier ships (destroyers, flagships) keep the full model.
- Explicit override: Users can always specify
model: haikuin their sailing orders for specific roles if they want fine-grained control.
This way the capability is there for users who want it, but we don't sacrifice reliability by default.
Haiku identity note
The observation that haiku misreads coordination framing as roleplay is real. The identity note in the crew briefing is a reasonable fix for that specific model. If we go with the user-triggered approach above, this note would only be included when haiku agents are actually deployed — which keeps the template clean for the default case.
The skill description rewrite (3x longer) is probably unnecessary if the identity note handles the guardrail issue at the agent level.
Conflicts with #21
PR #21 (just pushed) touches the same files with overlapping changes:
Task()references: #21 replaces allTask()/ "Task tool" references with the correctAgenttool names. Section 2a in this PR introduces newTask()references ("passing the model parameter in the Task() call") that would need updating.- Quarterdeck cadence: #21 replaces time-based cadence ("every 15-30 minutes") with event-driven checkpoints ("every 2-3 task completions, when a captain reports a blocker, or when a captain goes idle"). This PR changes the interval to "10-15 minutes" — but Claude can't reliably track wall-clock time, so event-driven triggers are more reliable.
- Crew briefing template: Both PRs modify
crew-briefing.md. #21 fixes theTask()references there too.
If you rebase onto #21 after it merges, most of the tool-name and cadence conflicts resolve naturally. Happy to help coordinate.
Suggestion
Would you be open to splitting this into two PRs?
- The uncontroversial fixes — indentation, admiral doctrine, captain's log gate, idle ship cleanup. These can land quickly.
- Model selection + haiku handling — reworked as a user-triggered cost control mechanism rather than a default. We could collaborate on the right UX for this — the ship-class mapping idea has legs.
The issue you opened (#20) is a good place to hash out the model selection design. Keen to hear your thoughts.
|
Happy to split it up. I put this out more to start discussion than wanting a quick merge. (I've recently been slammed at work and about to go on a family trip so it might be a while before I can get new PRs set up.) I'd like to address using the most capable agents. I followed that pattern where it seemed like it would matter. Captains of crew, the XO, and Explorers keep the more capable agent. So far while using these changes I've found very large savings when examining /cost after the fact. Often Haiku would do about the same work by token count as Sonnet-4.6 but cost 1/3rd as much. I know Haiku is less capable which is why the better model stays responsible for tasking and review. All that said keeping cost control at the user's discretion will be the best of all worlds. Savings if desired and greatest capability if not. TODO:
|
Avoid issues observed after context compaction