perf: use sets for O(1) membership testing in local search query path#2255
Open
dubin555 wants to merge 1 commit intomicrosoft:mainfrom
Open
perf: use sets for O(1) membership testing in local search query path#2255dubin555 wants to merge 1 commit intomicrosoft:mainfrom
dubin555 wants to merge 1 commit intomicrosoft:mainfrom
Conversation
Convert list-based membership testing to set-based across 10 call sites in 5 files within the local search context building code. Additionally, replace two O(n*m) inner loops with defaultdict index lookups: - _filter_relationships(): relationship link counting via source/target dicts - build_covariates_context(): covariate filtering via subject_id dict At 1000 entities / 10000 relationships, the combined hot path improves from ~1.7s to ~8.4ms (200x+ speedup). All results are identical — validated by benchmark assertions at three scales. Related: microsoft#2250
Author
|
@microsoft-github-policy-service agree |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The local search context-building code uses Python lists for entity/relationship/text-unit membership testing (
inoperator) across 10 call sites in 5 files. Sinceinon a list is O(n), this creates O(n·m) complexity in several hot paths during query execution. At scale (1000+ entities, 10000+ relationships), this adds over 1.7 seconds of unnecessary latency to the local search query path.This PR converts list comprehensions to set comprehensions for all membership-test collections and builds
defaultdictindex dicts for two O(n·m) inner loops, reducing the combined hot-path complexity from O(n²) to O(n).Related Issues
Related: #2250 (performance regression)
Proposed Changes
10 list→set conversions: Replace
[...]list comprehensions with{...}set comprehensions wherever the resulting collection is only used forinmembership testing:local_context.py—selected_entity_names, entity name sets for relationship/covariate filteringrelationships.py—get_in_network_relationships,get_out_network_relationships,get_candidate_relationships,get_entities_from_relationshipscovariates.py—get_candidate_covariatescommunity_reports.py—get_candidate_communitiestext_units.py—get_candidate_text_units2 algorithmic improvements using
defaultdict(list)index dicts:_filter_relationships()inlocal_context.py: relationship link counting now uses source/target index dicts instead of scanning all relationships per entitybuild_covariates_context()inlocal_context.py: covariate filtering now uses asubject_idindex dict instead of scanning all covariates per entityBenchmark results
Standalone benchmark with isolated before/after implementations, all correctness assertions pass (results are identical between old and new code at every scale):
_filter_relationshipsbuild_covariates_context_filter_relationshipsbuild_covariates_context_filter_relationshipsbuild_covariates_contextAt 1000 entities / 10000 relationships, the combined hot path goes from ~1.7s to ~8.4ms.
Checklist
Additional Notes
All changes are mechanical — the
inoperator works identically on sets and lists, so there is no behavioral change. The two index-dict refactors produce identical results, validated by key-by-key comparison assertions in the benchmark at three different scales. No new dependencies are introduced.