Skip to content

feat: Optional Boundary Inclusiveness for Within Queries#294

Merged
sdd merged 10 commits intosdd:v5.x.xfrom
cbueth:feat/optional-boundary-inclusiveness
Mar 18, 2026
Merged

feat: Optional Boundary Inclusiveness for Within Queries#294
sdd merged 10 commits intosdd:v5.x.xfrom
cbueth:feat/optional-boundary-inclusiveness

Conversation

@cbueth
Copy link

@cbueth cbueth commented Feb 26, 2026

TLDR; I have implemented the ability to toggle boundary inclusiveness for within queries across all KD-tree variants: Mutable, Immutable, Hybrid, and Fixed.

Description

This PR introduces the option for non-inclusive boundary searches in within queries. This change is motivated by requirements in certain information-theoretic estimators where a strict distance less than the radius is necessary (KSG: Type I vs Type 2 https://doi.org/10.1103/PhysRevE.69.066138, distance < radius is required for one of them). By generalizing the internal logic, we avoid code duplication while maintaining the high performance of Kiddo, especially in the leaf-level SIMD loops.

The implementation propagates an inclusive flag through the search macros for Mutable, Immutable, Hybrid, and Fixed trees. This is consistent with the findings about prunings and the changes of PR #290 & #291. New *_with_condition methods provide this flexibility while keeping the standard API backward compatible. So as #290 would change edge cases, with this PRs changes the user can choose to include or not to include points at exactly the maximal distance. If you want, please suggest an alternative name or API.

A full suite of tests has been added to verify boundary behavior across all tree types and distance metrics. One thing to keep in mind is whether we should eventually move towards a more structured QueryOptions approach if more search parameters are added in the future.

Changes:

  • I've updated the internal search macros and leaf-level distance checks to support an inclusive: bool flag. This flag is propagated through the query stack.
  • Introduced *_with_condition variants:
    • within -> within_with_condition
    • within_unsorted -> within_unsorted_with_condition
    • within_unsorted_iter -> within_unsorted_iter_with_condition
    • nearest_n_within -> nearest_n_within_with_condition
  • This is fully backward compatible.

@sdd
Copy link
Owner

sdd commented Feb 27, 2026

I like the idea of this. I'm leaning towards keeping the existing names for inclusive boundary operations, and suffixing with _exclusive otherwise. I'll have a think about this over the weekend.

I expect that the branch would optimise out in many cases but I'd like to confirm that by looking at the asm output before accepting this as we're adding a branch in an inner loop.

@cbueth cbueth force-pushed the feat/optional-boundary-inclusiveness branch from 7b947a6 to 86015d5 Compare February 27, 2026 09:10
@cbueth
Copy link
Author

cbueth commented Feb 27, 2026

I like the idea of this. I'm leaning towards keeping the existing names for inclusive boundary operations, and suffixing with _exclusive otherwise. I'll have a think about this over the weekend.

This is also a good choice. My first Idea was _strict.

I expect that the branch would optimise out in many cases but I'd like to confirm that by looking at the asm output before accepting this as we're adding a branch in an inner loop.

Ok, let me know of that outcome.

The alternative approach I had in mind was adding within_strict to the macros, in the query files and in the KdTree. this would duplicate not much, but more code than now and might be more annoying to maintain, but might help the compiler optimisation. So I hope the solution with the added flags and branch will optimise out.

Have a nice weekend!

@codecov
Copy link

codecov bot commented Mar 1, 2026

Codecov Report

❌ Patch coverage is 92.92035% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.02%. Comparing base (2c12af3) to head (3c19d7c).
⚠️ Report is 10 commits behind head on v5.x.x.

Files with missing lines Patch % Lines
src/float/distance.rs 94.52% 3 Missing and 1 partial ⚠️
src/common/generate_nearest_n_within_unsorted.rs 80.00% 2 Missing and 1 partial ⚠️
src/common/generate_within_unsorted_iter.rs 87.50% 0 Missing and 2 partials ⚠️
src/float_leaf_slice/leaf_slice.rs 93.75% 2 Missing ⚠️
...able/common/generate_immutable_nearest_n_within.rs 88.23% 1 Missing and 1 partial ⚠️
src/common/generate_within_unsorted.rs 90.90% 0 Missing and 1 partial ⚠️
src/float_leaf_slice/fallback.rs 94.73% 1 Missing ⚠️
...mutable/common/generate_immutable_best_n_within.rs 95.23% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           v5.x.x     #294      +/-   ##
==========================================
- Coverage   95.06%   95.02%   -0.04%     
==========================================
  Files          54       54              
  Lines        6301     6452     +151     
  Branches     6301     6452     +151     
==========================================
+ Hits         5990     6131     +141     
- Misses        287      291       +4     
- Partials       24       30       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cbueth
Copy link
Author

cbueth commented Mar 2, 2026

I have merged upstream/v5.x.x and #291 and linted the files. All checks pass.
For the code coverage, are there some lines you want me to add coverage for?

@cbueth
Copy link
Author

cbueth commented Mar 11, 2026

Would you consider these additions for v.5.3.0 as well? If so I would rebase after merging #291.

cbueth added 8 commits March 14, 2026 00:52
- deprecate `rd_update` with `D::accumulate` for consistent sum-based and max-based metrics
- conditional logic for SIMD (L1/L2) and general L∞
- differentiate distance accumulation behaviour
- integration `nearest_n` tests (Chebyshev, Manhattan, SquaredEuclidean).
- improve `DistanceMetric` doc
- add Gaussian scenario to tests
- add flag into query logic
- add test `test_within_squared_euclidean` for all metrics
@cbueth cbueth force-pushed the feat/optional-boundary-inclusiveness branch from 482e554 to ae548a9 Compare March 14, 2026 00:02
@cbueth
Copy link
Author

cbueth commented Mar 14, 2026

This PR has been rebased onto the merged #291.

@sdd
Copy link
Owner

sdd commented Mar 14, 2026

Thanks for the rebase - I'll give this another read today and either merge it in, or if not, release the rest as 5.3.0 👍🏼

@sdd
Copy link
Owner

sdd commented Mar 15, 2026

I've ran some performance benchmarks and the added conditional has not made any difference - if the compiler has not elided it completely, then the branch predictor is doing it's job and predicting it perfectly.

I think I'd still prefer if the naming scheme was _exclusive rather than _with_condition though. Once that change is made, I'll be happy to merge and get this out as part of 5.3.0 👍🏼

cbueth added 2 commits March 16, 2026 07:28
- rename `*_with_condition` to `*_exclusive` across query methods
- update method calls and tests
@cbueth
Copy link
Author

cbueth commented Mar 16, 2026

Great news about the asm output. I have additionally added to the documentation, which was not done in any commit of this PR before.

@sdd sdd merged commit 996f536 into sdd:v5.x.x Mar 18, 2026
10 checks passed
@sdd
Copy link
Owner

sdd commented Mar 18, 2026

Sorry for delay - merged! Will publish tonight once I get home, or tomorrow morning. Thanks!

@cbueth
Copy link
Author

cbueth commented Mar 19, 2026

Thanks for the merge and responsive support with this and the other two PRs, greatly apprechiated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants