feat: Optional Boundary Inclusiveness for Within Queries#294
feat: Optional Boundary Inclusiveness for Within Queries#294sdd merged 10 commits intosdd:v5.x.xfrom
Conversation
|
I like the idea of this. I'm leaning towards keeping the existing names for inclusive boundary operations, and suffixing with I expect that the branch would optimise out in many cases but I'd like to confirm that by looking at the asm output before accepting this as we're adding a branch in an inner loop. |
7b947a6 to
86015d5
Compare
This is also a good choice. My first Idea was
Ok, let me know of that outcome. The alternative approach I had in mind was adding Have a nice weekend! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## v5.x.x #294 +/- ##
==========================================
- Coverage 95.06% 95.02% -0.04%
==========================================
Files 54 54
Lines 6301 6452 +151
Branches 6301 6452 +151
==========================================
+ Hits 5990 6131 +141
- Misses 287 291 +4
- Partials 24 30 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I have merged |
|
Would you consider these additions for |
- deprecate `rd_update` with `D::accumulate` for consistent sum-based and max-based metrics - conditional logic for SIMD (L1/L2) and general L∞ - differentiate distance accumulation behaviour
- integration `nearest_n` tests (Chebyshev, Manhattan, SquaredEuclidean).
- improve `DistanceMetric` doc - add Gaussian scenario to tests
… trait - improve test coverage
- `A::dist` - `saturating_mul`
- add flag into query logic - add test `test_within_squared_euclidean` for all metrics
482e554 to
ae548a9
Compare
|
This PR has been rebased onto the merged #291. |
|
Thanks for the rebase - I'll give this another read today and either merge it in, or if not, release the rest as 5.3.0 👍🏼 |
|
I've ran some performance benchmarks and the added conditional has not made any difference - if the compiler has not elided it completely, then the branch predictor is doing it's job and predicting it perfectly. I think I'd still prefer if the naming scheme was |
- rename `*_with_condition` to `*_exclusive` across query methods - update method calls and tests
|
Great news about the asm output. I have additionally added to the documentation, which was not done in any commit of this PR before. |
|
Sorry for delay - merged! Will publish tonight once I get home, or tomorrow morning. Thanks! |
|
Thanks for the merge and responsive support with this and the other two PRs, greatly apprechiated! |
TLDR; I have implemented the ability to toggle boundary inclusiveness for
withinqueries across all KD-tree variants:Mutable,Immutable,Hybrid, andFixed.Description
This PR introduces the option for non-inclusive boundary searches in within queries. This change is motivated by requirements in certain information-theoretic estimators where a strict distance less than the radius is necessary (KSG: Type I vs Type 2 https://doi.org/10.1103/PhysRevE.69.066138,
distance < radiusis required for one of them). By generalizing the internal logic, we avoid code duplication while maintaining the high performance of Kiddo, especially in the leaf-level SIMD loops.The implementation propagates an inclusive flag through the search macros for Mutable, Immutable, Hybrid, and Fixed trees. This is consistent with the findings about prunings and the changes of PR #290 & #291. New
*_with_conditionmethods provide this flexibility while keeping the standard API backward compatible. So as #290 would change edge cases, with this PRs changes the user can choose to include or not to include points at exactly the maximal distance. If you want, please suggest an alternative name or API.A full suite of tests has been added to verify boundary behavior across all tree types and distance metrics. One thing to keep in mind is whether we should eventually move towards a more structured
QueryOptionsapproach if more search parameters are added in the future.Changes:
inclusive: boolflag. This flag is propagated through the query stack.*_with_conditionvariants:within->within_with_conditionwithin_unsorted->within_unsorted_with_conditionwithin_unsorted_iter->within_unsorted_iter_with_conditionnearest_n_within->nearest_n_within_with_condition