Skip to content

Use model haiku for most general-purpose crew#19

Closed
LannyRipple wants to merge 1 commit intoharrymunro:mainfrom
LannyRipple:ljr-changes
Closed

Use model haiku for most general-purpose crew#19
LannyRipple wants to merge 1 commit intoharrymunro:mainfrom
LannyRipple:ljr-changes

Conversation

@LannyRipple
Copy link

  • SKILL.md: specify models and guidance for tasking haiku
  • SKILL.md: description changed to avoid tickling Claude's roleplay and social engineering guardrails.
  • crew-briefing.md: to avoid haiku misreading tasking as roleplay or social engineering

Avoid issues observed after context compaction

  • SKILL.md: reinforce that the admiral does not implement
  • SKILL.md: explicit instructions to write captains-log.md before declaring mission completion.

* SKILL.md: specify models and guidance for tasking haiku
* SKILL.md: description changed to avoid tickling Claude's roleplay and
  social engineering guardrails.
* crew-briefing.md: to avoid haiku misreading tasking as roleplay or
  social engineering

Avoid issues observed after context compaction

* SKILL.md: reinforce that the admiral does not implement
* SKILL.md: explicit instructions to write captains-log.md before
  declaring mission completion.
@LannyRipple
Copy link
Author

Maybe this should be an add-on capability? Opened Issue #20 for discussion.

Copy link
Owner

@harrymunro harrymunro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LannyRipple — this addresses real failure modes you've clearly hit in practice. The compaction-survival doctrine, captain's log gate, and idle ship cleanup are all strong additions. A few thoughts on the model selection piece and some conflicts with work landing in #21.

Accept as-is

Indentation fixes (Steps 4, 5, 6) — Genuine formatting bug, good catch.

Admiral never implements (Admiralty Doctrine) — The standing order admiral-at-the-helm.md already covers this, but the compaction-survival note is the real value: "this constraint survives context compaction — re-form captains if they're gone." That's a failure mode we've seen too.

Captain's log to disk + Mission Complete Gate — Writing the log to chat only and then declaring victory is a real anti-pattern. The gate enforcement is good.

Clean up idle ships — Reasonable operational addition to the checkpoint list.

Push back: model selection as a default

The headline change — prescribing haiku for most crew — is where we diverge. Nelson should default to the most powerful agents available. Haiku is significantly less capable and the cost saving isn't worth the reliability hit for most missions.

That said, the flexibility is valuable. What we'd prefer is a user-driven cost control mechanism rather than a hardcoded default:

  • Default: All agents use the admiral's model (most powerful available).
  • User trigger: When the user signals cost sensitivity in their sailing orders — something like "keep costs under control", "budget-conscious", or an explicit token budget constraint — the admiral downgrades appropriate agents to haiku. This could map naturally to ship classes: lighter vessels (patrol boats, corvettes) get haiku, heavier ships (destroyers, flagships) keep the full model.
  • Explicit override: Users can always specify model: haiku in their sailing orders for specific roles if they want fine-grained control.

This way the capability is there for users who want it, but we don't sacrifice reliability by default.

Haiku identity note

The observation that haiku misreads coordination framing as roleplay is real. The identity note in the crew briefing is a reasonable fix for that specific model. If we go with the user-triggered approach above, this note would only be included when haiku agents are actually deployed — which keeps the template clean for the default case.

The skill description rewrite (3x longer) is probably unnecessary if the identity note handles the guardrail issue at the agent level.

Conflicts with #21

PR #21 (just pushed) touches the same files with overlapping changes:

  • Task() references: #21 replaces all Task() / "Task tool" references with the correct Agent tool names. Section 2a in this PR introduces new Task() references ("passing the model parameter in the Task() call") that would need updating.
  • Quarterdeck cadence: #21 replaces time-based cadence ("every 15-30 minutes") with event-driven checkpoints ("every 2-3 task completions, when a captain reports a blocker, or when a captain goes idle"). This PR changes the interval to "10-15 minutes" — but Claude can't reliably track wall-clock time, so event-driven triggers are more reliable.
  • Crew briefing template: Both PRs modify crew-briefing.md. #21 fixes the Task() references there too.

If you rebase onto #21 after it merges, most of the tool-name and cadence conflicts resolve naturally. Happy to help coordinate.

Suggestion

Would you be open to splitting this into two PRs?

  1. The uncontroversial fixes — indentation, admiral doctrine, captain's log gate, idle ship cleanup. These can land quickly.
  2. Model selection + haiku handling — reworked as a user-triggered cost control mechanism rather than a default. We could collaborate on the right UX for this — the ship-class mapping idea has legs.

The issue you opened (#20) is a good place to hash out the model selection design. Keen to hear your thoughts.

@LannyRipple
Copy link
Author

Happy to split it up. I put this out more to start discussion than wanting a quick merge. (I've recently been slammed at work and about to go on a family trip so it might be a while before I can get new PRs set up.)

I'd like to address using the most capable agents. I followed that pattern where it seemed like it would matter. Captains of crew, the XO, and Explorers keep the more capable agent. So far while using these changes I've found very large savings when examining /cost after the fact. Often Haiku would do about the same work by token count as Sonnet-4.6 but cost 1/3rd as much. I know Haiku is less capable which is why the better model stays responsible for tasking and review. All that said keeping cost control at the user's discretion will be the best of all worlds. Savings if desired and greatest capability if not.

TODO:

  • Grab latest upstream and review.
  • Break out uncontroversial changes and submit as PR.
  • Rework cost savings to leave under user control. I very much think this work will be a collaboration so when that PR lands please consider it a straw-man.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants