ocdocs (openclawdocs)

A CLI built for LLM agents, not humans. Syncs, indexes, and searches the entire OpenClaw documentation locally so AI agents can look things up instead of hallucinating them.

Why I Built This

I'm building CleoClaw — a Mission Control frontend for OpenClaw instances. Every time I ask an AI agent to implement something against the OpenClaw API, it confidently makes up endpoint behavior, invents config keys that don't exist, and hallucinates RPC methods. Context7, training data, web search — none of them reliably reflect what's actually in the OpenClaw docs right now.

So I built a tool that downloads the entire doc site, indexes it locally with SQLite FTS5, and lets agents query it with minimal token cost. Now when my agent needs to know how Discord integration works, it spends 150 tokens reading the summary instead of 10,000 tokens guessing wrong.

This is purpose-built for OpenClaw documentation. It's not a generic doc scraper. It knows about OpenClaw's llms-full.txt dump, Mintlify page structure, and the specific categories that matter when you're building on top of OpenClaw.

Humans can use it too (the output is perfectly readable, as you'll see below), but every design decision optimizes for agent consumption: structured output, minimal tokens, progressive disclosure.

What It Looks Like

Browse all 370 topics across 19 categories

ocdocs list

Search with FTS5 BM25 ranking + fuzzy matching

ocdocs search discord

Drill into any topic (summary first, full content on demand)

ocdocs show channels/discord

# Discord
Category: channels | 4,797 words | https://docs.openclaw.ai/channels/discord

Status: ready for DMs and guild channels via the official Discord gateway.

## Sections (17)
- Quick setup
- Recommended: Set up a guild workspace
- Runtime model
- Forum channels
- Interactive components
- Access control and routing
- Developer Portal setup
- Native commands and command auth
- Feature details
- Tools and action gates
- Components v2 UI
- Voice channels
- Voice messages
- Troubleshooting
- Configuration reference pointers
- Safety and operations
- Related

> openclaw-docs show channels/discord --full

Pull just the section you need

ocdocs show channels/discord -s "quick setup"

# Discord > Quick setup
Category: channels | Section of 4,797 word topic

You will need to create a new application with a bot, add the bot to your
server, and pair it to OpenClaw. We recommend adding your bot to your own
private server...

Diff local docs against the live site

ocdocs diff

Checking remote...
No changes detected. Local docs are up to date.

Sync pulls everything down (idempotent)

ocdocs sync

Syncing documentation...
Sync complete.
  Added:     0
  Updated:   0
  Removed:   0
  Unchanged: 370
  Total:     370

First run downloads 370 topics. Subsequent runs use ETag caching and content hashing — only fetches what changed.

Install

npm (recommended)

npm install -g openclawdocs
ocdocs sync

Requires Python 3.12+. The postinstall script creates a Python venv and installs everything automatically.

Three aliases work interchangeably: ocdocs, openclawdocs, openclaw-docs

pip (manual)

git clone https://github.com/kryptobaseddev/openclawdocs.git
cd openclawdocs
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
ocdocs sync

The Token Economy

Every command is designed around Minimum Viable Information (MVI) — agents get exactly what they need, nothing more. Each level costs proportionally more tokens:

Level 0: ocdocs search "discord"                        →   120 tokens  (find the page)
Level 1: ocdocs show channels/discord                   →   150 tokens  (summary + sections)
Level 2: ocdocs show channels/discord -s "quick setup"  → 1,000 tokens  (just that section)
Level 3: ocdocs show channels/discord --full            → 10,800 tokens (everything)

Most agent tasks finish at Level 1 or 2. The --full dump is a last resort. That's a 70-90x token reduction over blindly dumping entire doc pages into context.

Command	Purpose	Token Cost
`search <query>`	FTS5 BM25 + fuzzy search	~120 tokens
`search <query> -v`	Search with content snippets	~300 tokens
`show <path>`	Topic summary + section list	~150 tokens
`show <path> -s <name>`	Single section content	~500-1500 tokens
`show <path> --full`	Complete topic content	~2K-15K tokens
`list`	All categories with counts	~100 tokens
`list -c <category>`	Topics within a category	~200 tokens
`sync`	Download/update docs (idempotent)	N/A
`sync --force`	Force re-download, bypass cache	N/A
`status`	Sync health check	~50 tokens
`diff`	Compare local vs remote changes	~100 tokens

Agent Integration

Claude Code / CLAUDE.md

Drop this in your CLAUDE.md and your agent stops guessing:

When working with OpenClaw features, ALWAYS verify behavior against local docs:
  ocdocs search "<topic>"
  ocdocs show <path>
  ocdocs show <path> --section "<section name>"
Never assume OpenClaw API behavior from training data.

Agent Workflow Pattern

1. ocdocs status          — synced? (run ocdocs sync if not)
2. ocdocs search "TOPIC"  — find the right page
3. ocdocs show PATH       — read summary + section list
4. ocdocs show PATH -s X  — read the specific section needed
5. Only use --full as last resort

Programmatic (Python)

from openclaw_docs.storage import DocsStorage
from openclaw_docs.search import SearchEngine
from openclaw_docs.config import get_db_path, get_topics_dir

storage = DocsStorage(get_db_path())
engine = SearchEngine(storage, get_topics_dir())
results = engine.search("gateway authentication", limit=5)

Architecture

docs.openclaw.ai
       |
       +-- llms-full.txt ----> parser.py (section split) ----> 362 topics
       |                                                            |
       +-- llms.txt ----------> parser.py (index) ----------> index entries
       |                                                            |
       +-- /page HTML --------> trafilatura.extract() -------->  8 topics
                                                                    |
                                                        +-----------+
                                                        v
                                                 SQLite + FTS5
                                          (content-sync + triggers)
                                                        |
                                             +----------+----------+
                                             v          v          v
                                          search      show       list
                                         (BM25 +    (PD L1-3)  (categories)
                                          fuzzy)

Primary source: llms-full.txt — pre-formatted markdown dump from OpenClaw (362 pages)
Gap filler: trafilatura extracts ~8 pages missing from the dump (direct HTML scrape)
Search: SQLite FTS5 with BM25 ranking + rapidfuzz fuzzy title matching
Parsing: markdown-it-py AST for section extraction (no fragile regex)
Storage: SQLite with FTS5 external content table, auto-synced via triggers
Data location: OS-standard paths via platformdirs (~/.local/share/openclawdocs on Linux)

Coverage

370 topics across 19 categories. 100% of published English content on docs.openclaw.ai. 13 pages listed in the site navigation return HTTP 404 (unpublished experiments/plans) and are automatically skipped.

Built For CleoClaw

I'm actively using ocdocs while building CleoClaw — a Mission Control single-pane-of-glass frontend for managing OpenClaw instances. CleoClaw talks to OpenClaw's gateway API, manages agents, handles board chat, and needs to get the API behavior right. ocdocs is how my AI agents (and I) verify what the docs actually say before writing integration code.

If you're building anything on top of OpenClaw, this tool exists so your agents stop making things up.

Configuration

Env Variable	Purpose
`OPENCLAW_DOCS_DATA_DIR`	Override data storage directory

Default data locations (via platformdirs):

Linux: ~/.local/share/openclawdocs
macOS: ~/Library/Application Support/openclawdocs
Windows: C:\Users\<user>\AppData\Local\openclawdocs

Contributing

I'm a solo dev on this (well, me and my AI sidekick who definitely wrote more of this codebase than I did). If you have ideas for improvements, find bugs, or want to extend the tool — PRs and issues are welcome.

Things I'm actively looking for:

Better search ranking strategies for doc-specific queries
Smarter section extraction for Mintlify's custom MDX components
MCP server integration so agents can query docs without shelling out
Ideas for reducing first-sync time
Anything that makes this more useful for agents building on OpenClaw

Setup

Fork and clone
python3 -m venv .venv && source .venv/bin/activate && pip install -e .
Make changes
ocdocs sync && ocdocs search "test query" to verify
Create a changeset: npm run changeset
Submit a PR

Versioning

CalVer (YYYY.M.MICRO) enforced by VersionGuard. Change management via Changesets.

Code Style

Python: ruff (ruff check src/)
All CLI output must be machine-parseable, not prose
If a human can't read it, that's fine. If an agent can't parse it, that's a bug.

Support

If this saves your agents from hallucinating OpenClaw APIs, consider buying me a coffee.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.changeset		.changeset
.github		.github
bin		bin
docs/screenshots		docs/screenshots
scripts		scripts
src/openclaw_docs		src/openclaw_docs
.gitignore		.gitignore
.npmignore		.npmignore
.versionguard.yml		.versionguard.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
llms.txt		llms.txt
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ocdocs (openclawdocs)

Why I Built This

What It Looks Like

Browse all 370 topics across 19 categories

Search with FTS5 BM25 ranking + fuzzy matching

Drill into any topic (summary first, full content on demand)

Pull just the section you need

Diff local docs against the live site

Sync pulls everything down (idempotent)

Install

npm (recommended)

pip (manual)

The Token Economy

Agent Integration

Claude Code / CLAUDE.md

Agent Workflow Pattern

Programmatic (Python)

Architecture

Coverage

Built For CleoClaw

Configuration

Contributing

Setup

Versioning

Code Style

Support

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ocdocs (openclawdocs)

Why I Built This

What It Looks Like

Browse all 370 topics across 19 categories

Search with FTS5 BM25 ranking + fuzzy matching

Drill into any topic (summary first, full content on demand)

Pull just the section you need

Diff local docs against the live site

Sync pulls everything down (idempotent)

Install

npm (recommended)

pip (manual)

The Token Economy

Agent Integration

Claude Code / CLAUDE.md

Agent Workflow Pattern

Programmatic (Python)

Architecture

Coverage

Built For CleoClaw

Configuration

Contributing

Setup

Versioning

Code Style

Support

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages