Skip to content

perf: use sets for O(1) membership testing in local search query path#2255

Open
dubin555 wants to merge 1 commit intomicrosoft:mainfrom
dubin555:oss-scout/verify-perf-local-search-set-lookups
Open

perf: use sets for O(1) membership testing in local search query path#2255
dubin555 wants to merge 1 commit intomicrosoft:mainfrom
dubin555:oss-scout/verify-perf-local-search-set-lookups

Conversation

@dubin555
Copy link

@dubin555 dubin555 commented Mar 1, 2026

Description

The local search context-building code uses Python lists for entity/relationship/text-unit membership testing (in operator) across 10 call sites in 5 files. Since in on a list is O(n), this creates O(n·m) complexity in several hot paths during query execution. At scale (1000+ entities, 10000+ relationships), this adds over 1.7 seconds of unnecessary latency to the local search query path.

This PR converts list comprehensions to set comprehensions for all membership-test collections and builds defaultdict index dicts for two O(n·m) inner loops, reducing the combined hot-path complexity from O(n²) to O(n).

Related Issues

Related: #2250 (performance regression)

Proposed Changes

  • 10 list→set conversions: Replace [...] list comprehensions with {...} set comprehensions wherever the resulting collection is only used for in membership testing:

    • local_context.pyselected_entity_names, entity name sets for relationship/covariate filtering
    • relationships.pyget_in_network_relationships, get_out_network_relationships, get_candidate_relationships, get_entities_from_relationships
    • covariates.pyget_candidate_covariates
    • community_reports.pyget_candidate_communities
    • text_units.pyget_candidate_text_units
  • 2 algorithmic improvements using defaultdict(list) index dicts:

    • _filter_relationships() in local_context.py: relationship link counting now uses source/target index dicts instead of scanning all relationships per entity
    • build_covariates_context() in local_context.py: covariate filtering now uses a subject_id index dict instead of scanning all covariates per entity

Benchmark results

Standalone benchmark with isolated before/after implementations, all correctness assertions pass (results are identical between old and new code at every scale):

Scale Function List (ms) Set (ms) Speedup
100 entities, 1K rels _filter_relationships 8.1 0.26 30x
100 entities, 500 covs build_covariates_context 3.1 0.07 45x
500 entities, 5K rels _filter_relationships 236 1.7 141x
500 entities, 2K covs build_covariates_context 46 0.32 144x
1K entities, 10K rels _filter_relationships 1048 3.6 291x
1K entities, 5K covs build_covariates_context 246 0.86 286x

At 1000 entities / 10000 relationships, the combined hot path goes from ~1.7s to ~8.4ms.

Checklist

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have updated the documentation (if necessary).
  • I have added appropriate unit tests (if applicable).

Additional Notes

All changes are mechanical — the in operator works identically on sets and lists, so there is no behavioral change. The two index-dict refactors produce identical results, validated by key-by-key comparison assertions in the benchmark at three different scales. No new dependencies are introduced.

Convert list-based membership testing to set-based across 10 call sites
in 5 files within the local search context building code. Additionally,
replace two O(n*m) inner loops with defaultdict index lookups:
- _filter_relationships(): relationship link counting via source/target dicts
- build_covariates_context(): covariate filtering via subject_id dict

At 1000 entities / 10000 relationships, the combined hot path improves
from ~1.7s to ~8.4ms (200x+ speedup). All results are identical —
validated by benchmark assertions at three scales.

Related: microsoft#2250
@dubin555 dubin555 requested a review from a team as a code owner March 1, 2026 12:43
@dubin555
Copy link
Author

dubin555 commented Mar 1, 2026

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant