Skip to content

research(skills): MetaAgent tool meta-learning for autonomous knowledge gap detection #1865

@bug-ops

Description

@bug-ops

Summary

Autonomous skill evolution in two phases: Phase 1 distills improvements from successful execution trajectories (SkillRL pattern); Phase 2 detects knowledge gaps and synthesizes new skills from scratch (MetaAgent pattern).

Sources:

  • arXiv 2602.08234 — SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning (Feb 2026)
  • arXiv 2508.00271 — MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning (Aug 2025)

Phase 1: Trajectory Distillation (SkillRL pattern)

A 7B model outperforms GPT-4o by 41% on ALFWorld/WebShop when augmented with this approach.

When a session ends with high user satisfaction, extract successful tool-use patterns as skill refinement candidates:

  1. Add trajectory_capture = true config under [skills.learning] to record full tool-use sequences tagged with outcome.
  2. Implement TrajectoryDistiller: background task (runs on session shutdown or timer) that feeds recent positive trajectories to LLM with "what worked?" prompt → structured skill delta.
  3. Merge delta into candidate skill version via existing SkillVersionManager.
  4. Gate behind [skills.learning] trajectory_distillation = true (off by default, experimental).

Complexity: HIGH — trajectory capture storage + background distillation loop + skill evolution pipeline integration. Implement in phases.

Phase 2: Gap Detection + Skill Synthesis (MetaAgent pattern)

When skill registry has no matching skill for a task, instead of failing silently:

  1. Detect gap: GapDetector in zeph-skills fires when top skill score < gap_threshold
  2. Generate help-seeking query: structured request "I need a skill that can do X"
  3. Route: to web scraper (find docs) or LLM self-generation of new SKILL.md stub
  4. Reflect and persist: new skill enters registry at TrustLevel::Community with low initial confidence → after N successful uses → promoted to TrustLevel::Trusted

Integration point: SkillRegistry::match() + FeedbackDetector

Config:

[skills.meta_learning]
enabled = false
gap_threshold = 0.3
trajectory_distillation = false

Integration Points

  • crates/zeph-skills: TrajectoryDistiller, GapDetector, SkillSynthesizer
  • Extends existing: FeedbackDetector, Wilson score re-ranking, SkillVersionManager
  • Phase 1 prerequisite: trajectory capture storage (new SQLite table or append to audit log)
  • Phase 2 prerequisite: Phase 1 (gap detection accuracy improves with trajectory data)

Key Results

  • SkillRL: 7B model outperforms GPT-4o by 41% on ALFWorld/WebShop with trajectory distillation
  • MetaAgent: matches/exceeds end-to-end trained agents on GAIA, WebWalkerQA, BrowseCamp

See Also

Metadata

Metadata

Assignees

No one assigned

    Labels

    researchResearch-driven improvementskillszeph-skills crate

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions