fix(search): add direct name/slug matching for exact queries#126
fix(search): add direct name/slug matching for exact queries#126leo3linbeck wants to merge 1 commit intoopenclaw:mainfrom
Conversation
When searching for a skill by its exact name (e.g., 'guardian angel'), the vector search may not return it if the semantic similarity is low (e.g., the skill content is about 'moral evaluation' which has low cosine similarity to 'guardian angel'). This change adds a parallel direct match query that: 1. Checks for exact slug matches (e.g., 'guardian-angel') 2. Converts space-separated queries to hyphenated slugs 3. Searches displayName tokens for partial matches 4. Merges direct matches with vector search results 5. Gives exact name/slug matches a boosted score (1.0) This ensures skills are discoverable by their name even when the semantic content differs significantly from the name itself. Fixes: Skills with common-phrase names (like 'guardian-angel') not appearing in search results when queried by name.
|
Someone is attempting to deploy a commit to the Amantus Machina Team on Vercel. A member of the Team first needs to authorize it. |
| // Run direct name/slug match in parallel with vector search | ||
| // This ensures exact name matches are found even with low semantic similarity | ||
| const directMatchPromise = ctx.runQuery(internal.search.findDirectMatches, { | ||
| query, | ||
| queryTokens, | ||
| limit, | ||
| }) as Promise<DirectMatchEntry[]> |
There was a problem hiding this comment.
[P0] directMatchPromise will reject if findDirectMatches throws, causing the whole action to error (including the embedding-error fallback). Since this is started before generateEmbedding, any direct-match query failure will take down the search path even though vector search could have succeeded. Consider wrapping directMatchPromise with .catch(() => []) (or starting it after embedding generation) so vector search remains available when direct matching fails.
Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/search.ts
Line: 44:50
Comment:
[P0] `directMatchPromise` will reject if `findDirectMatches` throws, causing the whole action to error (including the embedding-error fallback). Since this is started before `generateEmbedding`, any direct-match query failure will take down the search path even though vector search could have succeeded. Consider wrapping `directMatchPromise` with `.catch(() => [])` (or starting it after embedding generation) so vector search remains available when direct matching fails.
How can I resolve this? If you propose a fix, please make it concise.| if (matchedSkills.length < limit) { | ||
| const recentSkills = await ctx.db | ||
| .query('skills') | ||
| .withIndex('by_active_updated', (q) => q.eq('softDeletedAt', undefined)) | ||
| .order('desc') | ||
| .take(500) // Check recent skills for name matches | ||
|
|
There was a problem hiding this comment.
[P1] The direct-match fallback scans take(500) recent skills and tokenizes each skill in a loop (tokenize(skill.displayName/slug)), which is O(500 * tokens) per search and may become a noticeable cost at scale. If this endpoint is hot, consider reducing the scan size, adding a more selective index/query for name tokens, or caching/precomputing tokenized forms.
Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/search.ts
Line: 195:201
Comment:
[P1] The direct-match fallback scans `take(500)` recent skills and tokenizes each skill in a loop (`tokenize(skill.displayName/slug)`), which is O(500 * tokens) per search and may become a noticeable cost at scale. If this endpoint is hot, consider reducing the scan size, adding a more selective index/query for name tokens, or caching/precomputing tokenized forms.
How can I resolve this? If you propose a fix, please make it concise.| let ownerHandle = ownerCache.get(skill.ownerUserId) | ||
| if (ownerHandle === undefined) { | ||
| const owner = await ctx.db.get(skill.ownerUserId) | ||
| ownerHandle = owner?.handle ?? owner?.name ?? null | ||
| ownerCache.set(skill.ownerUserId, ownerHandle) | ||
| } |
There was a problem hiding this comment.
[P2] ownerHandle derivation differs between hydration paths: findDirectMatches uses owner?.handle ?? owner?.name, while hydrateResults uses owner?.handle ?? owner?._id. This can produce inconsistent display of the same owner depending on whether the result came from direct match vs vector search. Aligning the fallback behavior would avoid surprising UI differences.
Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/search.ts
Line: 228:233
Comment:
[P2] `ownerHandle` derivation differs between hydration paths: `findDirectMatches` uses `owner?.handle ?? owner?.name`, while `hydrateResults` uses `owner?.handle ?? owner?._id`. This can produce inconsistent display of the same owner depending on whether the result came from direct match vs vector search. Aligning the fallback behavior would avoid surprising UI differences.
How can I resolve this? If you propose a fix, please make it concise.| const slugTokens = tokenize(skill.slug) | ||
| const allTokens = [...nameTokens, ...slugTokens] | ||
|
|
||
| const isMatch = queryTokens.every((qt) => allTokens.some((t) => t.includes(qt))) |
There was a problem hiding this comment.
| const isMatch = queryTokens.every((qt) => allTokens.some((t) => t.includes(qt))) | |
| const isMatch = queryTokens.some((qt) => allTokens.some((t) => t.includes(qt))) |
Inconsistent token matching logic between findDirectMatches (.every) and matchesExactTokens (.some) causes direct name matches to have stricter requirements than vector search results
| return [] | ||
| // Fall back to direct matches only | ||
| const directMatches = await directMatchPromise | ||
| return directMatches.map((entry) => ({ |
There was a problem hiding this comment.
| return directMatches.map((entry) => ({ | |
| // Enrich with badges and apply highlightedOnly filter if needed | |
| const badgeMapEntries = (await ctx.runQuery(internal.search.getSkillBadgeMapsInternal, { | |
| skillIds: directMatches.map((entry) => entry.skill._id), | |
| })) as Array<[Id<'skills'>, SkillBadgeMap]> | |
| const badgeMapBySkillId = new Map(badgeMapEntries) | |
| const enrichedMatches = directMatches.map((entry) => ({ | |
| ...entry, | |
| skill: { | |
| ...entry.skill, | |
| badges: badgeMapBySkillId.get(entry.skill._id) ?? {}, | |
| }, | |
| })) | |
| const filtered = args.highlightedOnly | |
| ? enrichedMatches.filter((entry) => isSkillHighlighted(entry.skill)) | |
| : enrichedMatches | |
| return filtered.map((entry) => ({ |
Fallback path returns direct matches without applying highlightedOnly filter when embedding generation fails
| // Add direct matches first with boosted score (exact name matches are highly relevant) | ||
| for (const entry of directMatches) { |
There was a problem hiding this comment.
| // Add direct matches first with boosted score (exact name matches are highly relevant) | |
| for (const entry of directMatches) { | |
| // Enrich direct matches with badges from skillBadges table for consistent filtering | |
| const directMatchBadgeEntries = (await ctx.runQuery(internal.search.getSkillBadgeMapsInternal, { | |
| skillIds: directMatches.map((entry) => entry.skill._id), | |
| })) as Array<[Id<'skills'>, SkillBadgeMap]> | |
| const directMatchBadgeMapBySkillId = new Map(directMatchBadgeEntries) | |
| const directMatchesWithBadges = directMatches.map((entry) => ({ | |
| ...entry, | |
| skill: { | |
| ...entry.skill, | |
| badges: directMatchBadgeMapBySkillId.get(entry.skill._id) ?? {}, | |
| }, | |
| })) | |
| // Add direct matches first with boosted score (exact name matches are highly relevant) | |
| for (const entry of directMatchesWithBadges) { |
Direct match skills are not enriched with badges from skillBadges table before highlightedOnly filter is applied, causing incorrect filtering behavior
Problem
When searching for a skill by its exact name (e.g., "guardian angel"), the search may not find it if the semantic similarity between the query and the skill's content is low.
For example, the skill
guardian-angel(about moral evaluation using Thomistic principles) doesn't appear when searching "guardian angel" because:However, searching "moral ethics thomistic" does find it, proving the skill is properly indexed.
Solution
Add a parallel direct-match query that runs alongside vector search:
by_slugindex for the normalized queryChanges
findDirectMatchesinternal query for direct name/slug lookupssearchSkillsto run direct matching in parallel with vector searchTesting
Before:
After:
Greptile Overview
Greptile Summary
This PR improves skill search recall for exact queries by adding an internal direct-match query (slug/name token matching) that runs in parallel with vector embedding search. Results from direct matching are boosted (score 1.0), merged ahead of vector-derived exact token matches, and de-duplicated by skill ID; embedding-generation failure now falls back to returning direct matches only.
The change fits into the existing
convex/search.tsflow by keeping the existing vectorSearch → hydrate → exact-token-filter pipeline intact while adding a separatefindDirectMatchesinternalQuery to ensure exact name/slug lookups can bypass the embedding similarity gate.Confidence Score: 3/5
convex/search.ts, butdirectMatchPromiseis awaited without error isolation, so a failure infindDirectMatchescan reject the entire action (even when vector search would otherwise work). There’s also a potential performance cost from scanning 500 recent skills per query.(2/5) Greptile learns from your feedback when you react with thumbs up/down!
Context used:
dashboard- AGENTS.md (source)