codebase-knowledge-builder

An agent skill that studies any repository and produces structured knowledge artifacts. Drop it into Claude Code, Cursor, OpenCode, or any agent that supports the agentskills spec, point it at a codebase, and get back documentation that actually helps.

What it does

Most agents forget what they read three files ago. This skill fixes that by following a four-phase process:

Reconnaissance -- scan the repo structure, identify the tech stack, map module boundaries
Deep-dive study -- trace happy paths, error paths, and edge cases through each subsystem
Artifact authoring -- fill a structured template covering architecture, key functions, gotchas, and Mermaid diagrams
Delivery -- hand back self-contained Markdown artifacts that any developer (or agent) can read cold

The output is a set of knowledge artifacts. Each one covers a single subsystem and stands on its own. No prior context needed.

When to use it

Onboarding onto an unfamiliar codebase
Producing documentation for a repo that has none
Preparing knowledge files so other agents can work on the project without re-reading everything
Studying a specific subsystem (auth, database layer, API routing, etc.) in depth

Install

npx skills add OthmanAdi/codebase-knowledge-builder --skill codebase-knowledge-builder -g

Works with Claude Code, Cursor, Codex, Gemini CLI, and 40+ agents supporting the Agent Skills spec.

Manual install: Copy skills/codebase-knowledge-builder/ to your agent's skills folder.

What's inside

skills/codebase-knowledge-builder/
  SKILL.md                              # Skill definition and workflow
  references/
    recon-checklist.md                   # Phase 1 checklist
    deep-dive-methodology.md            # File reading and tracing strategies
  templates/
    knowledge_artifact.md               # Output template for each subsystem

The SKILL.md stays lean (~80 lines). Detailed methodology lives in references/ and only gets loaded when needed. The template in templates/ defines the exact structure of every knowledge artifact the skill produces.

Example output

After running the skill on a Node.js API, each artifact includes:

Architecture overview with design pattern identification
Key components table (component, file path, responsibility)
Step-by-step data and control flow
Key functions table with parameters and return values
Configuration and environment variable mapping
Gotchas and pitfalls (race conditions, caching quirks, historical fixes)
Extension points for adding new functionality
Mermaid diagrams for visual flow

How it works under the hood

The skill uses progressive disclosure. When an agent triggers it, only the SKILL.md body loads into context (~600 words). The references and template load on demand during each phase. This keeps the context window clean for the actual codebase files being studied.

Scratch files (recon_findings.md, per-file notes) are saved during study so the agent doesn't lose findings as it reads more files. The quality checklist at the end catches incomplete sections, missing diagrams, and placeholder text before delivery.

Contributing

See CONTRIBUTING.md for guidelines on submitting issues and pull requests.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codebase-knowledge-builder

What it does

When to use it

Install

What's inside

Example output

How it works under the hood

Contributing

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

codebase-knowledge-builder

What it does

When to use it

Install

What's inside

Example output

How it works under the hood

Contributing

License