Skip to content

feat(vector-search): make secondary sort stable within distance buckets#2833

Open
nkneupper wants to merge 1 commit intotypesense:v31from
nkneupper:feat/vector-distance-bucket-stable-sort
Open

feat(vector-search): make secondary sort stable within distance buckets#2833
nkneupper wants to merge 1 commit intotypesense:v31from
nkneupper:feat/vector-distance-bucket-stable-sort

Conversation

@nkneupper
Copy link

When using _vector_distance(buckets: N) or _vector_distance(bucket_size: N), results within the same bucket were being reordered by document key as a tiebreaker, disrupting the original vector distance ordering when secondary sort values were equal. Add is_greater_kv_group_stable (without the key tiebreaker) so that std::stable_sort preserves the pre-bucket ordering when scores are tied.

My product use case for this stable sort is to utilize the secondary sort as a kind of "search boost". The vector distance is the primary sort, but within each bucket, a secondary sort can apply to locally boost certain results within that bucket.

I am open to feedback and would be willing to explore an interface change to optionally make this secondary sort stable rather than having it stable by default (e.g. sort_by: _vector_distance(bucket_size: 25):asc, boost_score:asc:stable)

Change Summary

  • Added KV::is_greater_kv_group_stable for stable sorting
  • Added stable sort test to collection_sorting_test.cpp

PR Checklist

When using _vector_distance(buckets: N) or _vector_distance(bucket_size: N),
results within the same bucket were being reordered by document key as a
tiebreaker, disrupting the original vector distance ordering when secondary
sort values were equal. Add is_greater_kv_group_stable (without the key
tiebreaker) so that std::stable_sort preserves the pre-bucket ordering when
scores are tied.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant