Add incremental invalidation/resolution by st0012 · Pull Request #589 · Shopify/rubydex

st0012 · 2026-02-19T23:27:22Z

Note: The declaration_to_names reverse index, unresolve_name, unresolve_reference, and related primitives have been extracted into #627, which is the base for this PR.

Summary

Replace the full re-resolution strategy (clear_declarations + resolve everything from scratch) with incremental invalidation. Graph mutations (update/delete_document) now compute the minimal set of definitions, references, and ancestor chains that need re-resolution, and the resolver processes only that subset.

How it works

Graph mutations follow a three-step pipeline:

invalidate — Detaches old document data from declarations. Identifies which namespace declarations are affected (definition removed, new definition added, new mixin reference added). Runs invalidate_ancestor_chains → unresolve_affected_references → cascade_name_invalidation on the combined set. Collects empty declarations for tree removal.
remove_document_data — Removes raw refs/defs/names/strings from maps.
extend — Merges new LocalGraph data and queues new definitions/references for resolution.

Work items are accumulated as PendingWork (definitions, references, ancestors) and drained internally by the resolver. For the initial full index, this contains everything. For incremental updates, only the invalidated subset.

Reverse indices

Four reverse indices are added to Graph to avoid O(N) scans during invalidation:

declaration_to_names — which names resolve to a given declaration (extracted to Add unresolve functions and declaration_to_names reverse index #627)
name_to_references — which constant references use a given name
name_dependents — which names depend on another name (via nesting/parent_scope)
name_to_definitions — which definitions use a given name

These enable targeted BFS walks from affected declarations instead of scanning all names/references.

Cascade invalidation

When a declaration's ancestors change, the invalidation cascades:

Ancestor chains are cleared (BFS through descendants)
Constant references scoped to affected declarations are unresolved (BFS through name dependents)
Names that depended on unresolved names are themselves unresolved (BFS through name dependents)
Declarations left with no definitions are removed along with their entire member/singleton sub-tree

Compared to `main`

Correctness: identical — all declaration counts, definition counts, orphan rates, and linked/orphan breakdowns match exactly between main and the branch.

Performance (initial full index on 94,036 files):

Stage	main	branch	Delta
Listing	0.645s	0.701s	+0.056s (+8.7%)
Indexing	10.309s	10.261s	-0.048s (-0.5%)
Resolution	41.570s	25.275s	-16.295s (-39.2%)
Querying	0.647s	0.696s	+0.049s (+7.6%)
Total	53.171s	36.933s	-16.238s (-30.5%)

Memory:

	main	branch	Delta
Max RSS	3,909 MB	4,437 MB	+528 MB (+13.5%)

Resolution is 39% faster at the cost of 13.5% more memory from five reverse indices (declaration_to_names, name_to_references, name_dependents, name_to_definitions, pending_work).

vinistock

I'm still trying to reason about the algorithm, so not done reviewing yet.

rust/rubydex/src/test_utils/graph_test.rs