research(skills): MetaAgent tool meta-learning for autonomous knowledge gap detection

## Summary

Autonomous skill evolution in two phases: Phase 1 distills improvements from successful execution trajectories (SkillRL pattern); Phase 2 detects knowledge gaps and synthesizes new skills from scratch (MetaAgent pattern).

**Sources**:
- arXiv 2602.08234 — *SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning* (Feb 2026)
- arXiv 2508.00271 — *MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning* (Aug 2025)

## Phase 1: Trajectory Distillation (SkillRL pattern)

A 7B model outperforms GPT-4o by 41% on ALFWorld/WebShop when augmented with this approach.

When a session ends with high user satisfaction, extract successful tool-use patterns as skill refinement candidates:

1. Add `trajectory_capture = true` config under `[skills.learning]` to record full tool-use sequences tagged with outcome.
2. Implement `TrajectoryDistiller`: background task (runs on session shutdown or timer) that feeds recent positive trajectories to LLM with "what worked?" prompt → structured skill delta.
3. Merge delta into candidate skill version via existing `SkillVersionManager`.
4. Gate behind `[skills.learning] trajectory_distillation = true` (off by default, experimental).

**Complexity**: HIGH — trajectory capture storage + background distillation loop + skill evolution pipeline integration. Implement in phases.

## Phase 2: Gap Detection + Skill Synthesis (MetaAgent pattern)

When skill registry has no matching skill for a task, instead of failing silently:

1. **Detect gap**: `GapDetector` in `zeph-skills` fires when top skill score < `gap_threshold`
2. **Generate help-seeking query**: structured request "I need a skill that can do X"
3. **Route**: to web scraper (find docs) or LLM self-generation of new SKILL.md stub
4. **Reflect and persist**: new skill enters registry at `TrustLevel::Community` with low initial confidence → after N successful uses → promoted to `TrustLevel::Trusted`

**Integration point**: `SkillRegistry::match()` + `FeedbackDetector`

Config:
```toml
[skills.meta_learning]
enabled = false
gap_threshold = 0.3
trajectory_distillation = false
```

## Integration Points

- `crates/zeph-skills`: `TrajectoryDistiller`, `GapDetector`, `SkillSynthesizer`
- Extends existing: `FeedbackDetector`, `Wilson score re-ranking`, `SkillVersionManager`
- Phase 1 prerequisite: trajectory capture storage (new SQLite table or append to audit log)
- Phase 2 prerequisite: Phase 1 (gap detection accuracy improves with trajectory data)

## Key Results

- SkillRL: 7B model outperforms GPT-4o by 41% on ALFWorld/WebShop with trajectory distillation
- MetaAgent: matches/exceeds end-to-end trained agents on GAIA, WebWalkerQA, BrowseCamp

## See Also

- #1842 (TDAD behavioral spec testing for skills)
- Existing `zeph-skills` self-learning: `FeedbackDetector`, `SkillVersionManager`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(skills): MetaAgent tool meta-learning for autonomous knowledge gap detection #1865

Summary

Phase 1: Trajectory Distillation (SkillRL pattern)

Phase 2: Gap Detection + Skill Synthesis (MetaAgent pattern)

Integration Points

Key Results

See Also

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research(skills): MetaAgent tool meta-learning for autonomous knowledge gap detection #1865

Description

Summary

Phase 1: Trajectory Distillation (SkillRL pattern)

Phase 2: Gap Detection + Skill Synthesis (MetaAgent pattern)

Integration Points

Key Results

See Also

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions