-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
researchResearch-driven improvementResearch-driven improvementskillszeph-skills cratezeph-skills crate
Description
Summary
Autonomous skill evolution in two phases: Phase 1 distills improvements from successful execution trajectories (SkillRL pattern); Phase 2 detects knowledge gaps and synthesizes new skills from scratch (MetaAgent pattern).
Sources:
- arXiv 2602.08234 — SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning (Feb 2026)
- arXiv 2508.00271 — MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning (Aug 2025)
Phase 1: Trajectory Distillation (SkillRL pattern)
A 7B model outperforms GPT-4o by 41% on ALFWorld/WebShop when augmented with this approach.
When a session ends with high user satisfaction, extract successful tool-use patterns as skill refinement candidates:
- Add
trajectory_capture = trueconfig under[skills.learning]to record full tool-use sequences tagged with outcome. - Implement
TrajectoryDistiller: background task (runs on session shutdown or timer) that feeds recent positive trajectories to LLM with "what worked?" prompt → structured skill delta. - Merge delta into candidate skill version via existing
SkillVersionManager. - Gate behind
[skills.learning] trajectory_distillation = true(off by default, experimental).
Complexity: HIGH — trajectory capture storage + background distillation loop + skill evolution pipeline integration. Implement in phases.
Phase 2: Gap Detection + Skill Synthesis (MetaAgent pattern)
When skill registry has no matching skill for a task, instead of failing silently:
- Detect gap:
GapDetectorinzeph-skillsfires when top skill score <gap_threshold - Generate help-seeking query: structured request "I need a skill that can do X"
- Route: to web scraper (find docs) or LLM self-generation of new SKILL.md stub
- Reflect and persist: new skill enters registry at
TrustLevel::Communitywith low initial confidence → after N successful uses → promoted toTrustLevel::Trusted
Integration point: SkillRegistry::match() + FeedbackDetector
Config:
[skills.meta_learning]
enabled = false
gap_threshold = 0.3
trajectory_distillation = falseIntegration Points
crates/zeph-skills:TrajectoryDistiller,GapDetector,SkillSynthesizer- Extends existing:
FeedbackDetector,Wilson score re-ranking,SkillVersionManager - Phase 1 prerequisite: trajectory capture storage (new SQLite table or append to audit log)
- Phase 2 prerequisite: Phase 1 (gap detection accuracy improves with trajectory data)
Key Results
- SkillRL: 7B model outperforms GPT-4o by 41% on ALFWorld/WebShop with trajectory distillation
- MetaAgent: matches/exceeds end-to-end trained agents on GAIA, WebWalkerQA, BrowseCamp
See Also
- research(testing): TDAD behavioral spec testing for skills and system prompt blocks #1842 (TDAD behavioral spec testing for skills)
- Existing
zeph-skillsself-learning:FeedbackDetector,SkillVersionManager
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
researchResearch-driven improvementResearch-driven improvementskillszeph-skills cratezeph-skills crate