feat(nimbus-mcp): add search_docs tool by tylermorrisford · Pull Request #1216 · commercetools/nimbus

tylermorrisford · 2026-03-09T18:15:35Z

Summary

Adds a search_docs tool to the Nimbus MCP server that searches across all documentation views — not just the truncated overview captured in the search index.

New tool: `search_docs`

Two-phase lazy-load architecture: Phase 1 searches the lightweight in-memory index (title, description, tags, 500-char content excerpt) for candidate pages. Phase 2 lazy-loads full route JSON for candidates and searches all views (overview, dev, guidelines, a11y) for deep content matches.
Results are ranked by combined phase-2 viewMatch signal + phase-1 field-weighted relevance score so exact-match pages surface above fuzzy fallbacks.
Returns up to 10 results with content snippets anchored to the earliest token match. Snippets are always stripped of MDX markup before being returned.
Tool description routes callers to get_component, get_tokens, or search_icons based on the category field of each result.

Problem solved

The search index truncated content to 500 chars of the overview tab, missing ~87% of documentation. Props, import paths, accessibility details, and design guidelines were invisible to search. This implementation searches the full content of all views without requiring build pipeline changes.

Relevance utility (`src/utils/relevance.ts`)

Extracted scoreRelevance, rankByRelevance, and filterAndRankByRelevance into a shared utility used by both search_docs and list_components.
filterAndRankByRelevance is a single-pass filter+rank: lowercases each field once and uses the same values for both the token-presence predicate and the relevance score — more efficient than rankByRelevance(items.filter(...)).
Field weights: title (8) > description/tags (4) > content (1).

`get_component` improvements

Compact JSON for metadata response (no null, 2).
Section content (overview, guidelines, implementation, accessibility) is now stripped of MDX markup before returning, reducing token usage.
Fuse instance for fuzzy component resolution is cached at module level, keyed against manifest.routes so it is never rebuilt when the reference is stable — fixing a bug where the cache was invalidated on every call because catalog was a new .filter() array reference each time.
topLevelNames set (used for sub-component filtering) is similarly cached.

`list_components` improvements

Exact-match search path now uses filterAndRankByRelevance instead of an unordered .filter(), so results are ranked by relevance.

`stripMarkdown` utility (`src/utils/markdown.ts`)

Moved to a dedicated utility file, shared by search_docs and get_component.
Protects fenced code blocks: code blocks are extracted into placeholders before any stripping runs and restored after — JSX tags inside code examples are no longer mangled.
Strips lowercase HTML tags (<br />, <div>, etc.) in addition to uppercase JSX components.
Frontmatter regex no longer uses /m flag — ^ anchors to the true start of the string, preventing mid-document --- separators from being eaten.

Type consolidation (`src/types.ts`)

All interfaces previously scattered across tool files and data-loader.ts are now centralised in src/types.ts, including the new RelevanceFields, CandidateResult, ViewMatch, DocSearchResult (with optional category), and ComponentSummary types.

`CLAUDE.md`

Documents the "never pretty-print JSON" and "never return raw MDX" rules.
Documents the Fuse caching requirement (cache instances; key invalidation against stable manifest reference).
Documents the build script pattern for adding new prebuild steps.

Design doc

See packages/nimbus-mcp/docs/search-index-truncation.md for the full problem analysis and options considered.

Jira

CRAFT-2137

Test plan

123 tests pass (pnpm vitest run in packages/nimbus-mcp)
TypeScript typecheck passes
search_docs — broad queries (e.g. "button") return ≤10 relevant results
search_docs — deep content queries ("ButtonProps", "import Button from") find results from dev/guidelines views with correct matchedView
search_docs — "Colors" ranks in top 2 for "color tokens"
search_docs — nonsense queries return empty results
list_components — title-match component ranks first for exact-name queries
get_component — section content contains no fenced code fence markers
stripMarkdown — JSX inside code blocks is preserved; mid-doc --- not eaten

🤖 Generated with Claude Code

vercel · 2026-03-09T18:15:40Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
nimbus-documentation	Ready	Preview, Comment	Mar 19, 2026 7:02pm
nimbus-storybook	Ready	Preview, Comment	Mar 19, 2026 7:02pm

changeset-bot · 2026-03-09T18:15:41Z

⚠️ No Changeset found

Latest commit: 5cfb9ff

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

packages/nimbus-mcp/src/tools/search-docs.spec.ts

valoriecarli · 2026-03-11T19:41:02Z

packages/nimbus-mcp/src/tools/search-docs.ts

+              { name: "title", weight: 3 },
+              { name: "description", weight: 2 },
+              { name: "tags", weight: 2 },
+              { name: "content", weight: 1 },
+            ],
+            threshold: 0.4,


i had to read up on how this works, but ill assume you picked these numbers for a reason.

claude haggled with itself over them, and we went from 0.3 to 0.4 for the threshold. What do you think? good/bad?

I honestly don't know just yet, I had to work through a few examples to get it to click. If anything, we can sideline it (for now) and test/tweak it a bit more later when we're ready?

valoriecarli · 2026-03-11T19:54:57Z

packages/nimbus-mcp/src/tools/search-docs.ts

+    {
+      title: "Search Docs",
+      description:
+        "Fuzzy-searches across all Nimbus documentation (components, patterns, guides, tokens) and returns the top matching pages with content snippets.",


@tylermorrisford I get that this is just a description, but this implies goes through ALL Nimbus documentation? All .mdx files from top to bottom? That's... a lot.

update - i see... it's reading a pre-built json file, but content is limited to the first 500 chars. if this is the case i don't think this would be an issue...unless we have a super long winded guide that should be broken down.

The search looks through data/docs/search-index.json instead of trying to process anything an .mdx extension which you're right that would be a lot

@valoriecarli This is a good point you're bringing up:

The search index is truncated to 500 characters per doc entry. This means search_docs can only match against the first ~500 chars of each page's content. Anything deeper in a doc page — further sections, API details, examples — is invisible to the search. The 500-char limit applies at index build time in nimbus-docs-build, not in the MCP tool itself. The tool searches faithfully over what it's given, but it's given incomplete content.

This resulted in a huge improvement for user experience 🏆
see slack discussion

valoriecarli

I left some comments/ questions regarding some logic, but nothing to hold this up at this stage. 🙌🏻

Implements a search_docs MCP tool that performs two-pass search (exact substring then Fuse.js fuzzy fallback) across all Nimbus documentation. Returns top 10 matches with title, description, path, and content snippet. - Add search-docs tool with registerSearchDocs entry point - Add DocSearchResult shared type to types.ts - Register tool in server.ts - Add behavioral tests for search and relevance Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The search index only indexed the first 500 chars of the overview tab, missing 87% of content (dev, guidelines, a11y views). This replaces the single-pass index search with a two-phase approach: - Phase 1: lightweight search against the in-memory index for candidates - Phase 2: lazy-load full route files for candidates and search all views Props, import paths, accessibility details, and guidelines content are now fully searchable. Results include a matchedView field and snippets from actual match locations. Includes a design doc with problem analysis and options considered. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…plementation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…each query, remove pretty-printing from json returns to reduce token length, add mcp claude.md for remembering package-specific directives

…ntext bloat and perf issues - Move all interfaces from data-loader, flatten-tokens, and tool files into src/types.ts - Add category field to DocSearchResult so LLM can route follow-up tool calls - Update tool description with category-based follow-up guidance and prop lookup clarification - Remove pretty-printing from get-component props response and list-components - Cache toLowerCase() in searchRouteViews to avoid redundant work across search passes - Extract PHASE2_CANDIDATE_LIMIT named constant (was magic number 20) - Update CLAUDE.md with type placement rule and build script instructions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… test cleanup - Extract filterAndRankByRelevance single-pass util; wire into search-docs and list-components - Fix resolveFuseInstance cache key (was new array ref each call, now keyed to manifest.routes) - Fix topLevelNames rebuilt on every aggregateSubComponentProps call (now cached) - Fix stripMarkdown: extract code blocks before stripping so JSX in examples is not mangled - Fix stripMarkdown: remove /m flag from frontmatter regex (mid-doc --- no longer eaten) - Fix stripMarkdown: strip lowercase HTML tags and fenced code fence markers - Fix search-docs phase-2 budget: matched entries consume their share before expanded fills remainder - Fix category field: omit from DocSearchResult when empty (sparse response) - Add SNIPPET_LEAD named constant; anchor extractSnippet to earliest token position - Tokenise query once in main handler; pass pre-parsed tokens to findCandidates and searchRouteViews - Fuse invalidation in search-docs now checks index object identity - Combine two-pass view scan into single loop in searchRouteViews - Compact JSON everywhere (remove null, 2 from metadata response) - Strip markdown from section content before returning to LLM - Shared module-level MCP client in search-docs.spec and get-component.spec - Add markdown.spec with full coverage including new edge cases - Document multi-field weight accumulation in scoreRelevance JSDoc - Update CLAUDE.md: add MDX stripping rule and Fuse caching performance guideline Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ByronDWall

I approve this message

tylermorrisford requested a review from ByronDWall March 9, 2026 18:16

tylermorrisford self-assigned this Mar 9, 2026

tylermorrisford requested a review from valoriecarli March 9, 2026 18:16

valoriecarli reviewed Mar 11, 2026

View reviewed changes

packages/nimbus-mcp/src/tools/search-docs.spec.ts Show resolved Hide resolved

valoriecarli reviewed Mar 11, 2026

View reviewed changes

valoriecarli approved these changes Mar 11, 2026

View reviewed changes

vercel bot deployed to Preview – nimbus-documentation March 12, 2026 14:16 View deployment

vercel bot deployed to Preview – nimbus-storybook March 12, 2026 14:17 View deployment

vercel bot deployed to Preview – nimbus-documentation March 12, 2026 14:21 View deployment

vercel bot deployed to Preview – nimbus-storybook March 12, 2026 14:23 View deployment

tylermorrisford and others added 6 commits March 19, 2026 11:54

docs(nimbus-mcp): update search truncation doc to reflect option 4 im…

da6ff29

…plementation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(search-docs): lint

0bd5d32

feat(search docs tool): cache fuse instance so it is not rebuilt for …

25353c1

…each query, remove pretty-printing from json returns to reduce token length, add mcp claude.md for remembering package-specific directives

ByronDWall force-pushed the CRAFT-2137-implement-search-docs-tool branch from eefcc6d to a95c9db Compare March 19, 2026 17:12

vercel bot deployed to Preview – nimbus-documentation March 19, 2026 17:15 View deployment

vercel bot deployed to Preview – nimbus-storybook March 19, 2026 17:17 View deployment

vercel bot deployed to Preview – nimbus-documentation March 19, 2026 18:49 View deployment

ByronDWall force-pushed the CRAFT-2137-implement-search-docs-tool branch from 3cb6e23 to 757f626 Compare March 19, 2026 18:50

vercel bot deployed to Preview – nimbus-documentation March 19, 2026 18:53 View deployment

vercel bot deployed to Preview – nimbus-storybook March 19, 2026 18:54 View deployment

ByronDWall force-pushed the CRAFT-2137-implement-search-docs-tool branch from 757f626 to 5cfb9ff Compare March 19, 2026 18:58

vercel bot deployed to Preview – nimbus-documentation March 19, 2026 19:00 View deployment

vercel bot deployed to Preview – nimbus-storybook March 19, 2026 19:02 View deployment

ByronDWall approved these changes Mar 19, 2026

View reviewed changes

ByronDWall merged commit 2c05f75 into main Mar 19, 2026
9 checks passed

ByronDWall deleted the CRAFT-2137-implement-search-docs-tool branch March 19, 2026 19:54

ByronDWall restored the CRAFT-2137-implement-search-docs-tool branch March 19, 2026 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(nimbus-mcp): add search_docs tool#1216

feat(nimbus-mcp): add search_docs tool#1216
ByronDWall merged 7 commits intomainfrom
CRAFT-2137-implement-search-docs-tool

tylermorrisford commented Mar 9, 2026 •

edited by ByronDWall

Loading

Uh oh!

vercel bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

changeset-bot bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

valoriecarli Mar 11, 2026

Uh oh!

tylermorrisford Mar 11, 2026

Uh oh!

valoriecarli Mar 11, 2026

Uh oh!

valoriecarli Mar 11, 2026 •

edited

Loading

Uh oh!

tylermorrisford Mar 11, 2026

Uh oh!

tylermorrisford Mar 11, 2026

Uh oh!

tylermorrisford Mar 17, 2026

Uh oh!

valoriecarli left a comment

Uh oh!

ByronDWall left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tylermorrisford commented Mar 9, 2026 • edited by ByronDWall Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New tool: search_docs

Problem solved

Relevance utility (src/utils/relevance.ts)

get_component improvements

list_components improvements

stripMarkdown utility (src/utils/markdown.ts)

Type consolidation (src/types.ts)

CLAUDE.md

Design doc

Jira

Test plan

Uh oh!

vercel bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

Uh oh!

valoriecarli Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

tylermorrisford Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

valoriecarli Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

valoriecarli Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tylermorrisford Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

tylermorrisford Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

tylermorrisford Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

valoriecarli left a comment

Choose a reason for hiding this comment

Uh oh!

ByronDWall left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tylermorrisford commented Mar 9, 2026 •

edited by ByronDWall

Loading

New tool: `search_docs`

Relevance utility (`src/utils/relevance.ts`)

`get_component` improvements

`list_components` improvements

`stripMarkdown` utility (`src/utils/markdown.ts`)

Type consolidation (`src/types.ts`)

`CLAUDE.md`

vercel bot commented Mar 9, 2026 •

edited

Loading

changeset-bot bot commented Mar 9, 2026 •

edited

Loading

valoriecarli Mar 11, 2026 •

edited

Loading