feat(nimbus-mcp): add search_docs tool#1216
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
| { name: "title", weight: 3 }, | ||
| { name: "description", weight: 2 }, | ||
| { name: "tags", weight: 2 }, | ||
| { name: "content", weight: 1 }, | ||
| ], | ||
| threshold: 0.4, |
There was a problem hiding this comment.
i had to read up on how this works, but ill assume you picked these numbers for a reason.
There was a problem hiding this comment.
claude haggled with itself over them, and we went from 0.3 to 0.4 for the threshold. What do you think? good/bad?
There was a problem hiding this comment.
I honestly don't know just yet, I had to work through a few examples to get it to click. If anything, we can sideline it (for now) and test/tweak it a bit more later when we're ready?
| { | ||
| title: "Search Docs", | ||
| description: | ||
| "Fuzzy-searches across all Nimbus documentation (components, patterns, guides, tokens) and returns the top matching pages with content snippets.", |
There was a problem hiding this comment.
@tylermorrisford I get that this is just a description, but this implies goes through ALL Nimbus documentation? All .mdx files from top to bottom? That's... a lot.
update - i see... it's reading a pre-built json file, but content is limited to the first 500 chars. if this is the case i don't think this would be an issue...unless we have a super long winded guide that should be broken down.
There was a problem hiding this comment.
The search looks through data/docs/search-index.json instead of trying to process anything an .mdx extension which you're right that would be a lot
There was a problem hiding this comment.
@valoriecarli This is a good point you're bringing up:
The search index is truncated to 500 characters per doc entry. This means search_docs can only match against the first ~500 chars of each page's content. Anything deeper in a doc page — further sections, API details, examples — is invisible to the search.
The 500-char limit applies at index build time in nimbus-docs-build, not in the MCP tool itself. The tool searches faithfully over what it's given, but it's given incomplete content.
There was a problem hiding this comment.
This resulted in a huge improvement for user experience 🏆
see slack discussion
valoriecarli
left a comment
There was a problem hiding this comment.
I left some comments/ questions regarding some logic, but nothing to hold this up at this stage. 🙌🏻
Implements a search_docs MCP tool that performs two-pass search (exact substring then Fuse.js fuzzy fallback) across all Nimbus documentation. Returns top 10 matches with title, description, path, and content snippet. - Add search-docs tool with registerSearchDocs entry point - Add DocSearchResult shared type to types.ts - Register tool in server.ts - Add behavioral tests for search and relevance Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The search index only indexed the first 500 chars of the overview tab, missing 87% of content (dev, guidelines, a11y views). This replaces the single-pass index search with a two-phase approach: - Phase 1: lightweight search against the in-memory index for candidates - Phase 2: lazy-load full route files for candidates and search all views Props, import paths, accessibility details, and guidelines content are now fully searchable. Results include a matchedView field and snippets from actual match locations. Includes a design doc with problem analysis and options considered. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…plementation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…each query, remove pretty-printing from json returns to reduce token length, add mcp claude.md for remembering package-specific directives
…ntext bloat and perf issues - Move all interfaces from data-loader, flatten-tokens, and tool files into src/types.ts - Add category field to DocSearchResult so LLM can route follow-up tool calls - Update tool description with category-based follow-up guidance and prop lookup clarification - Remove pretty-printing from get-component props response and list-components - Cache toLowerCase() in searchRouteViews to avoid redundant work across search passes - Extract PHASE2_CANDIDATE_LIMIT named constant (was magic number 20) - Update CLAUDE.md with type placement rule and build script instructions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
eefcc6d to
a95c9db
Compare
3cb6e23 to
757f626
Compare
… test cleanup - Extract filterAndRankByRelevance single-pass util; wire into search-docs and list-components - Fix resolveFuseInstance cache key (was new array ref each call, now keyed to manifest.routes) - Fix topLevelNames rebuilt on every aggregateSubComponentProps call (now cached) - Fix stripMarkdown: extract code blocks before stripping so JSX in examples is not mangled - Fix stripMarkdown: remove /m flag from frontmatter regex (mid-doc --- no longer eaten) - Fix stripMarkdown: strip lowercase HTML tags and fenced code fence markers - Fix search-docs phase-2 budget: matched entries consume their share before expanded fills remainder - Fix category field: omit from DocSearchResult when empty (sparse response) - Add SNIPPET_LEAD named constant; anchor extractSnippet to earliest token position - Tokenise query once in main handler; pass pre-parsed tokens to findCandidates and searchRouteViews - Fuse invalidation in search-docs now checks index object identity - Combine two-pass view scan into single loop in searchRouteViews - Compact JSON everywhere (remove null, 2 from metadata response) - Strip markdown from section content before returning to LLM - Shared module-level MCP client in search-docs.spec and get-component.spec - Add markdown.spec with full coverage including new edge cases - Document multi-field weight accumulation in scoreRelevance JSDoc - Update CLAUDE.md: add MDX stripping rule and Fuse caching performance guideline Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
757f626 to
5cfb9ff
Compare
ByronDWall
left a comment
There was a problem hiding this comment.
I approve this message
Summary
Adds a
search_docstool to the Nimbus MCP server that searches across all documentation views — not just the truncated overview captured in the search index.New tool:
search_docsget_component,get_tokens, orsearch_iconsbased on thecategoryfield of each result.Problem solved
The search index truncated content to 500 chars of the overview tab, missing ~87% of documentation. Props, import paths, accessibility details, and design guidelines were invisible to search. This implementation searches the full content of all views without requiring build pipeline changes.
Relevance utility (
src/utils/relevance.ts)scoreRelevance,rankByRelevance, andfilterAndRankByRelevanceinto a shared utility used by bothsearch_docsandlist_components.filterAndRankByRelevanceis a single-pass filter+rank: lowercases each field once and uses the same values for both the token-presence predicate and the relevance score — more efficient thanrankByRelevance(items.filter(...)).get_componentimprovementsnull, 2).manifest.routesso it is never rebuilt when the reference is stable — fixing a bug where the cache was invalidated on every call becausecatalogwas a new.filter()array reference each time.topLevelNamesset (used for sub-component filtering) is similarly cached.list_componentsimprovementsfilterAndRankByRelevanceinstead of an unordered.filter(), so results are ranked by relevance.stripMarkdownutility (src/utils/markdown.ts)search_docsandget_component.<br />,<div>, etc.) in addition to uppercase JSX components./mflag —^anchors to the true start of the string, preventing mid-document---separators from being eaten.Type consolidation (
src/types.ts)data-loader.tsare now centralised insrc/types.ts, including the newRelevanceFields,CandidateResult,ViewMatch,DocSearchResult(with optionalcategory), andComponentSummarytypes.CLAUDE.mdDesign doc
See
packages/nimbus-mcp/docs/search-index-truncation.mdfor the full problem analysis and options considered.Jira
CRAFT-2137
Test plan
pnpm vitest runinpackages/nimbus-mcp)search_docs— broad queries (e.g. "button") return ≤10 relevant resultssearch_docs— deep content queries ("ButtonProps", "import Button from") find results from dev/guidelines views with correctmatchedViewsearch_docs— "Colors" ranks in top 2 for "color tokens"search_docs— nonsense queries return empty resultslist_components— title-match component ranks first for exact-name queriesget_component— section content contains no fenced code fence markersstripMarkdown— JSX inside code blocks is preserved; mid-doc---not eaten🤖 Generated with Claude Code