Skip to content

fix(search): add direct name/slug matching for exact queries#126

Open
leo3linbeck wants to merge 1 commit intoopenclaw:mainfrom
leo3linbeck:fix/search-exact-name-match
Open

fix(search): add direct name/slug matching for exact queries#126
leo3linbeck wants to merge 1 commit intoopenclaw:mainfrom
leo3linbeck:fix/search-exact-name-match

Conversation

@leo3linbeck
Copy link

@leo3linbeck leo3linbeck commented Feb 4, 2026

Problem

When searching for a skill by its exact name (e.g., "guardian angel"), the search may not find it if the semantic similarity between the query and the skill's content is low.

For example, the skill guardian-angel (about moral evaluation using Thomistic principles) doesn't appear when searching "guardian angel" because:

  1. Vector search uses embeddings based on the skill's content/description
  2. "guardian angel" (a protective spirit concept) has low cosine similarity to "moral evaluation"
  3. The skill never makes it past the vector search phase to the exact-match filter

However, searching "moral ethics thomistic" does find it, proving the skill is properly indexed.

Solution

Add a parallel direct-match query that runs alongside vector search:

  1. Exact slug match: Query by_slug index for the normalized query
  2. Hyphenated conversion: Convert "guardian angel" → "guardian-angel" and check
  3. Token matching: Search recent skills where all query tokens appear in displayName/slug
  4. Merge results: Combine direct matches with vector results, deduplicating by skill ID
  5. Boost scoring: Give exact name/slug matches a score of 1.0 (highest relevance)

Changes

  • Added findDirectMatches internal query for direct name/slug lookups
  • Modified searchSkills to run direct matching in parallel with vector search
  • Direct matches appear first in results with boosted score
  • No changes to soul search (could be added in follow-up if needed)

Testing

Before:

clawhub search "guardian angel" → (no results)
clawhub search "moral ethics thomistic" → guardian-angel v1.0.2

After:

clawhub search "guardian angel" → guardian-angel v1.0.2 (score: 1.0)
clawhub search "moral ethics thomistic" → guardian-angel v1.0.2

Greptile Overview

Greptile Summary

This PR improves skill search recall for exact queries by adding an internal direct-match query (slug/name token matching) that runs in parallel with vector embedding search. Results from direct matching are boosted (score 1.0), merged ahead of vector-derived exact token matches, and de-duplicated by skill ID; embedding-generation failure now falls back to returning direct matches only.

The change fits into the existing convex/search.ts flow by keeping the existing vectorSearch → hydrate → exact-token-filter pipeline intact while adding a separate findDirectMatches internalQuery to ensure exact name/slug lookups can bypass the embedding similarity gate.

Confidence Score: 3/5

  • Reasonably safe to merge, but there is one failure-mode that can break search if the new direct-match query errors.
  • Core logic is straightforward and localized to convex/search.ts, but directMatchPromise is awaited without error isolation, so a failure in findDirectMatches can reject the entire action (even when vector search would otherwise work). There’s also a potential performance cost from scanning 500 recent skills per query.
  • convex/search.ts

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Context used:

  • Context from dashboard - AGENTS.md (source)

When searching for a skill by its exact name (e.g., 'guardian angel'),
the vector search may not return it if the semantic similarity is low
(e.g., the skill content is about 'moral evaluation' which has low
cosine similarity to 'guardian angel').

This change adds a parallel direct match query that:
1. Checks for exact slug matches (e.g., 'guardian-angel')
2. Converts space-separated queries to hyphenated slugs
3. Searches displayName tokens for partial matches
4. Merges direct matches with vector search results
5. Gives exact name/slug matches a boosted score (1.0)

This ensures skills are discoverable by their name even when the
semantic content differs significantly from the name itself.

Fixes: Skills with common-phrase names (like 'guardian-angel') not
appearing in search results when queried by name.
@vercel
Copy link
Contributor

vercel bot commented Feb 4, 2026

Someone is attempting to deploy a commit to the Amantus Machina Team on Vercel.

A member of the Team first needs to authorize it.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +44 to +50
// Run direct name/slug match in parallel with vector search
// This ensures exact name matches are found even with low semantic similarity
const directMatchPromise = ctx.runQuery(internal.search.findDirectMatches, {
query,
queryTokens,
limit,
}) as Promise<DirectMatchEntry[]>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P0] directMatchPromise will reject if findDirectMatches throws, causing the whole action to error (including the embedding-error fallback). Since this is started before generateEmbedding, any direct-match query failure will take down the search path even though vector search could have succeeded. Consider wrapping directMatchPromise with .catch(() => []) (or starting it after embedding generation) so vector search remains available when direct matching fails.

Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/search.ts
Line: 44:50

Comment:
[P0] `directMatchPromise` will reject if `findDirectMatches` throws, causing the whole action to error (including the embedding-error fallback). Since this is started before `generateEmbedding`, any direct-match query failure will take down the search path even though vector search could have succeeded. Consider wrapping `directMatchPromise` with `.catch(() => [])` (or starting it after embedding generation) so vector search remains available when direct matching fails.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +195 to +201
if (matchedSkills.length < limit) {
const recentSkills = await ctx.db
.query('skills')
.withIndex('by_active_updated', (q) => q.eq('softDeletedAt', undefined))
.order('desc')
.take(500) // Check recent skills for name matches

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] The direct-match fallback scans take(500) recent skills and tokenizes each skill in a loop (tokenize(skill.displayName/slug)), which is O(500 * tokens) per search and may become a noticeable cost at scale. If this endpoint is hot, consider reducing the scan size, adding a more selective index/query for name tokens, or caching/precomputing tokenized forms.

Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/search.ts
Line: 195:201

Comment:
[P1] The direct-match fallback scans `take(500)` recent skills and tokenizes each skill in a loop (`tokenize(skill.displayName/slug)`), which is O(500 * tokens) per search and may become a noticeable cost at scale. If this endpoint is hot, consider reducing the scan size, adding a more selective index/query for name tokens, or caching/precomputing tokenized forms.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +228 to +233
let ownerHandle = ownerCache.get(skill.ownerUserId)
if (ownerHandle === undefined) {
const owner = await ctx.db.get(skill.ownerUserId)
ownerHandle = owner?.handle ?? owner?.name ?? null
ownerCache.set(skill.ownerUserId, ownerHandle)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] ownerHandle derivation differs between hydration paths: findDirectMatches uses owner?.handle ?? owner?.name, while hydrateResults uses owner?.handle ?? owner?._id. This can produce inconsistent display of the same owner depending on whether the result came from direct match vs vector search. Aligning the fallback behavior would avoid surprising UI differences.

Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/search.ts
Line: 228:233

Comment:
[P2] `ownerHandle` derivation differs between hydration paths: `findDirectMatches` uses `owner?.handle ?? owner?.name`, while `hydrateResults` uses `owner?.handle ?? owner?._id`. This can produce inconsistent display of the same owner depending on whether the result came from direct match vs vector search. Aligning the fallback behavior would avoid surprising UI differences.

How can I resolve this? If you propose a fix, please make it concise.

const slugTokens = tokenize(skill.slug)
const allTokens = [...nameTokens, ...slugTokens]

const isMatch = queryTokens.every((qt) => allTokens.some((t) => t.includes(qt)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const isMatch = queryTokens.every((qt) => allTokens.some((t) => t.includes(qt)))
const isMatch = queryTokens.some((qt) => allTokens.some((t) => t.includes(qt)))

Inconsistent token matching logic between findDirectMatches (.every) and matchesExactTokens (.some) causes direct name matches to have stricter requirements than vector search results

Fix on Vercel

return []
// Fall back to direct matches only
const directMatches = await directMatchPromise
return directMatches.map((entry) => ({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return directMatches.map((entry) => ({
// Enrich with badges and apply highlightedOnly filter if needed
const badgeMapEntries = (await ctx.runQuery(internal.search.getSkillBadgeMapsInternal, {
skillIds: directMatches.map((entry) => entry.skill._id),
})) as Array<[Id<'skills'>, SkillBadgeMap]>
const badgeMapBySkillId = new Map(badgeMapEntries)
const enrichedMatches = directMatches.map((entry) => ({
...entry,
skill: {
...entry.skill,
badges: badgeMapBySkillId.get(entry.skill._id) ?? {},
},
}))
const filtered = args.highlightedOnly
? enrichedMatches.filter((entry) => isSkillHighlighted(entry.skill))
: enrichedMatches
return filtered.map((entry) => ({

Fallback path returns direct matches without applying highlightedOnly filter when embedding generation fails

Fix on Vercel

Comment on lines +126 to +127
// Add direct matches first with boosted score (exact name matches are highly relevant)
for (const entry of directMatches) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Add direct matches first with boosted score (exact name matches are highly relevant)
for (const entry of directMatches) {
// Enrich direct matches with badges from skillBadges table for consistent filtering
const directMatchBadgeEntries = (await ctx.runQuery(internal.search.getSkillBadgeMapsInternal, {
skillIds: directMatches.map((entry) => entry.skill._id),
})) as Array<[Id<'skills'>, SkillBadgeMap]>
const directMatchBadgeMapBySkillId = new Map(directMatchBadgeEntries)
const directMatchesWithBadges = directMatches.map((entry) => ({
...entry,
skill: {
...entry.skill,
badges: directMatchBadgeMapBySkillId.get(entry.skill._id) ?? {},
},
}))
// Add direct matches first with boosted score (exact name matches are highly relevant)
for (const entry of directMatchesWithBadges) {

Direct match skills are not enriched with badges from skillBadges table before highlightedOnly filter is applied, causing incorrect filtering behavior

Fix on Vercel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant