Skip to content

feat(nimbus-mcp): add search_docs tool#1216

Merged
ByronDWall merged 7 commits intomainfrom
CRAFT-2137-implement-search-docs-tool
Mar 19, 2026
Merged

feat(nimbus-mcp): add search_docs tool#1216
ByronDWall merged 7 commits intomainfrom
CRAFT-2137-implement-search-docs-tool

Conversation

@tylermorrisford
Copy link
Contributor

@tylermorrisford tylermorrisford commented Mar 9, 2026

Summary

Adds a search_docs tool to the Nimbus MCP server that searches across all documentation views — not just the truncated overview captured in the search index.

New tool: search_docs

  • Two-phase lazy-load architecture: Phase 1 searches the lightweight in-memory index (title, description, tags, 500-char content excerpt) for candidate pages. Phase 2 lazy-loads full route JSON for candidates and searches all views (overview, dev, guidelines, a11y) for deep content matches.
  • Results are ranked by combined phase-2 viewMatch signal + phase-1 field-weighted relevance score so exact-match pages surface above fuzzy fallbacks.
  • Returns up to 10 results with content snippets anchored to the earliest token match. Snippets are always stripped of MDX markup before being returned.
  • Tool description routes callers to get_component, get_tokens, or search_icons based on the category field of each result.

Problem solved

The search index truncated content to 500 chars of the overview tab, missing ~87% of documentation. Props, import paths, accessibility details, and design guidelines were invisible to search. This implementation searches the full content of all views without requiring build pipeline changes.

Relevance utility (src/utils/relevance.ts)

  • Extracted scoreRelevance, rankByRelevance, and filterAndRankByRelevance into a shared utility used by both search_docs and list_components.
  • filterAndRankByRelevance is a single-pass filter+rank: lowercases each field once and uses the same values for both the token-presence predicate and the relevance score — more efficient than rankByRelevance(items.filter(...)).
  • Field weights: title (8) > description/tags (4) > content (1).

get_component improvements

  • Compact JSON for metadata response (no null, 2).
  • Section content (overview, guidelines, implementation, accessibility) is now stripped of MDX markup before returning, reducing token usage.
  • Fuse instance for fuzzy component resolution is cached at module level, keyed against manifest.routes so it is never rebuilt when the reference is stable — fixing a bug where the cache was invalidated on every call because catalog was a new .filter() array reference each time.
  • topLevelNames set (used for sub-component filtering) is similarly cached.

list_components improvements

  • Exact-match search path now uses filterAndRankByRelevance instead of an unordered .filter(), so results are ranked by relevance.

stripMarkdown utility (src/utils/markdown.ts)

  • Moved to a dedicated utility file, shared by search_docs and get_component.
  • Protects fenced code blocks: code blocks are extracted into placeholders before any stripping runs and restored after — JSX tags inside code examples are no longer mangled.
  • Strips lowercase HTML tags (<br />, <div>, etc.) in addition to uppercase JSX components.
  • Frontmatter regex no longer uses /m flag — ^ anchors to the true start of the string, preventing mid-document --- separators from being eaten.

Type consolidation (src/types.ts)

  • All interfaces previously scattered across tool files and data-loader.ts are now centralised in src/types.ts, including the new RelevanceFields, CandidateResult, ViewMatch, DocSearchResult (with optional category), and ComponentSummary types.

CLAUDE.md

  • Documents the "never pretty-print JSON" and "never return raw MDX" rules.
  • Documents the Fuse caching requirement (cache instances; key invalidation against stable manifest reference).
  • Documents the build script pattern for adding new prebuild steps.

Design doc

See packages/nimbus-mcp/docs/search-index-truncation.md for the full problem analysis and options considered.

Jira

CRAFT-2137

Test plan

  • 123 tests pass (pnpm vitest run in packages/nimbus-mcp)
  • TypeScript typecheck passes
  • search_docs — broad queries (e.g. "button") return ≤10 relevant results
  • search_docs — deep content queries ("ButtonProps", "import Button from") find results from dev/guidelines views with correct matchedView
  • search_docs — "Colors" ranks in top 2 for "color tokens"
  • search_docs — nonsense queries return empty results
  • list_components — title-match component ranks first for exact-name queries
  • get_component — section content contains no fenced code fence markers
  • stripMarkdown — JSX inside code blocks is preserved; mid-doc --- not eaten

🤖 Generated with Claude Code

@vercel
Copy link

vercel bot commented Mar 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
nimbus-documentation Ready Ready Preview, Comment Mar 19, 2026 7:02pm
nimbus-storybook Ready Ready Preview, Comment Mar 19, 2026 7:02pm

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Mar 9, 2026

⚠️ No Changeset found

Latest commit: 5cfb9ff

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Comment on lines +103 to +108
{ name: "title", weight: 3 },
{ name: "description", weight: 2 },
{ name: "tags", weight: 2 },
{ name: "content", weight: 1 },
],
threshold: 0.4,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had to read up on how this works, but ill assume you picked these numbers for a reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude haggled with itself over them, and we went from 0.3 to 0.4 for the threshold. What do you think? good/bad?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't know just yet, I had to work through a few examples to get it to click. If anything, we can sideline it (for now) and test/tweak it a bit more later when we're ready?

{
title: "Search Docs",
description:
"Fuzzy-searches across all Nimbus documentation (components, patterns, guides, tokens) and returns the top matching pages with content snippets.",
Copy link
Collaborator

@valoriecarli valoriecarli Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylermorrisford I get that this is just a description, but this implies goes through ALL Nimbus documentation? All .mdx files from top to bottom? That's... a lot.

update - i see... it's reading a pre-built json file, but content is limited to the first 500 chars. if this is the case i don't think this would be an issue...unless we have a super long winded guide that should be broken down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The search looks through data/docs/search-index.json instead of trying to process anything an .mdx extension which you're right that would be a lot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@valoriecarli This is a good point you're bringing up:

The search index is truncated to 500 characters per doc entry. This means search_docs can only match against the first ~500 chars of each page's content. Anything deeper in a doc page — further sections, API details, examples — is invisible to the search.

The 500-char limit applies at index build time in nimbus-docs-build, not in the MCP tool itself. The tool searches faithfully over what it's given, but it's given incomplete content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resulted in a huge improvement for user experience 🏆
see slack discussion

Copy link
Collaborator

@valoriecarli valoriecarli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments/ questions regarding some logic, but nothing to hold this up at this stage. 🙌🏻

tylermorrisford and others added 6 commits March 19, 2026 11:54
Implements a search_docs MCP tool that performs two-pass search (exact
substring then Fuse.js fuzzy fallback) across all Nimbus documentation.
Returns top 10 matches with title, description, path, and content snippet.

- Add search-docs tool with registerSearchDocs entry point
- Add DocSearchResult shared type to types.ts
- Register tool in server.ts
- Add behavioral tests for search and relevance

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The search index only indexed the first 500 chars of the overview tab,
missing 87% of content (dev, guidelines, a11y views). This replaces the
single-pass index search with a two-phase approach:

- Phase 1: lightweight search against the in-memory index for candidates
- Phase 2: lazy-load full route files for candidates and search all views

Props, import paths, accessibility details, and guidelines content are
now fully searchable. Results include a matchedView field and snippets
from actual match locations.

Includes a design doc with problem analysis and options considered.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…plementation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…each query, remove pretty-printing from json returns to reduce token length, add mcp claude.md for remembering package-specific directives
…ntext bloat and perf issues

- Move all interfaces from data-loader, flatten-tokens, and tool files into src/types.ts
- Add category field to DocSearchResult so LLM can route follow-up tool calls
- Update tool description with category-based follow-up guidance and prop lookup clarification
- Remove pretty-printing from get-component props response and list-components
- Cache toLowerCase() in searchRouteViews to avoid redundant work across search passes
- Extract PHASE2_CANDIDATE_LIMIT named constant (was magic number 20)
- Update CLAUDE.md with type placement rule and build script instructions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… test cleanup

- Extract filterAndRankByRelevance single-pass util; wire into search-docs and list-components
- Fix resolveFuseInstance cache key (was new array ref each call, now keyed to manifest.routes)
- Fix topLevelNames rebuilt on every aggregateSubComponentProps call (now cached)
- Fix stripMarkdown: extract code blocks before stripping so JSX in examples is not mangled
- Fix stripMarkdown: remove /m flag from frontmatter regex (mid-doc --- no longer eaten)
- Fix stripMarkdown: strip lowercase HTML tags and fenced code fence markers
- Fix search-docs phase-2 budget: matched entries consume their share before expanded fills remainder
- Fix category field: omit from DocSearchResult when empty (sparse response)
- Add SNIPPET_LEAD named constant; anchor extractSnippet to earliest token position
- Tokenise query once in main handler; pass pre-parsed tokens to findCandidates and searchRouteViews
- Fuse invalidation in search-docs now checks index object identity
- Combine two-pass view scan into single loop in searchRouteViews
- Compact JSON everywhere (remove null, 2 from metadata response)
- Strip markdown from section content before returning to LLM
- Shared module-level MCP client in search-docs.spec and get-component.spec
- Add markdown.spec with full coverage including new edge cases
- Document multi-field weight accumulation in scoreRelevance JSDoc
- Update CLAUDE.md: add MDX stripping rule and Fuse caching performance guideline

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Contributor

@ByronDWall ByronDWall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this message

@ByronDWall ByronDWall merged commit 2c05f75 into main Mar 19, 2026
9 checks passed
@ByronDWall ByronDWall deleted the CRAFT-2137-implement-search-docs-tool branch March 19, 2026 19:54
@ByronDWall ByronDWall restored the CRAFT-2137-implement-search-docs-tool branch March 19, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants