Skip to content

research(skills): automated skill acquisition via open-source repository mining #1889

@bug-ops

Description

@bug-ops

Summary

Systematic framework for mining GitHub repositories to extract agent skills and translate them into SKILL.md format using dense retrieval for semantic skill identification. Produces SKILL.md-compatible output directly ingestible by Zeph's skill registry.

Source: arXiv 2603.11808 — "Automating Skill Acquisition via Open-Source Repository Mining" (March 2026)

Technique

  1. Repository crawl: clone/index GitHub repos by topic/language
  2. Dense retrieval: embed README, docs, code summaries to identify procedural knowledge units
  3. SKILL.md generation: map extracted procedures to SKILL.md spec format (name, description, steps, examples)
  4. Quality filtering: dedup via embedding similarity, reject low-quality extractions

Applicability to Zeph

HIGH. Zeph already implements the SKILL.md specification (zeph-skills parser, registry, embedding matcher). This paper proposes a batch pipeline to bootstrap the skill registry from external repositories — completely additive, no format translation required.

Current Zeph self-learning (MetaAgent + SkillRL, #1865) generates skills from session trajectories. This extends it with bulk acquisition from public knowledge. Together: session-driven refinement + corpus-bootstrapped initial skill set.

Implementation sketch

  • Offline mining script: scripts/mine-skills.py or a zeph-skills-miner crate
  • Input: GitHub search API queries (e.g., "devops automation", "data analysis workflows")
  • Output: SKILL.md files written to a configurable skills directory
  • Integration: existing skills.paths config picks them up on next startup (hot-reload compatible)
  • Quality gate: cosine similarity dedup against existing registry before writing

Extends #1865 (MetaAgent + SkillRL) — complementary acquisition path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    researchResearch-driven improvementskillszeph-skills crate

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions