Skip to content

Add incremental invalidation/resolution#589

Open
st0012 wants to merge 4 commits intounresolve-primitivesfrom
add-incremental-invalidation
Open

Add incremental invalidation/resolution#589
st0012 wants to merge 4 commits intounresolve-primitivesfrom
add-incremental-invalidation

Conversation

@st0012
Copy link
Member

@st0012 st0012 commented Feb 19, 2026

Note: The declaration_to_names reverse index, unresolve_name, unresolve_reference, and related primitives have been extracted into #627, which is the base for this PR.

Summary

Replace the full re-resolution strategy (clear_declarations + resolve everything from scratch) with incremental invalidation. Graph mutations (update/delete_document) now compute the minimal set of definitions, references, and ancestor chains that need re-resolution, and the resolver processes only that subset.

How it works

Graph mutations follow a three-step pipeline:

  1. invalidate — Detaches old document data from declarations. Identifies which namespace declarations are affected (definition removed, new definition added, new mixin reference added). Runs invalidate_ancestor_chainsunresolve_affected_referencescascade_name_invalidation on the combined set. Collects empty declarations for tree removal.
  2. remove_document_data — Removes raw refs/defs/names/strings from maps.
  3. extend — Merges new LocalGraph data and queues new definitions/references for resolution.

Work items are accumulated as PendingWork (definitions, references, ancestors) and drained internally by the resolver. For the initial full index, this contains everything. For incremental updates, only the invalidated subset.

Reverse indices

Four reverse indices are added to Graph to avoid O(N) scans during invalidation:

  • declaration_to_names — which names resolve to a given declaration (extracted to Add unresolve functions and declaration_to_names reverse index #627)
  • name_to_references — which constant references use a given name
  • name_dependents — which names depend on another name (via nesting/parent_scope)
  • name_to_definitions — which definitions use a given name

These enable targeted BFS walks from affected declarations instead of scanning all names/references.

Cascade invalidation

When a declaration's ancestors change, the invalidation cascades:

  1. Ancestor chains are cleared (BFS through descendants)
  2. Constant references scoped to affected declarations are unresolved (BFS through name dependents)
  3. Names that depended on unresolved names are themselves unresolved (BFS through name dependents)
  4. Declarations left with no definitions are removed along with their entire member/singleton sub-tree

Compared to main

Correctness: identical — all declaration counts, definition counts, orphan rates, and linked/orphan breakdowns match exactly between main and the branch.

Performance (initial full index on 94,036 files):

Stage main branch Delta
Listing 0.645s 0.701s +0.056s (+8.7%)
Indexing 10.309s 10.261s -0.048s (-0.5%)
Resolution 41.570s 25.275s -16.295s (-39.2%)
Querying 0.647s 0.696s +0.049s (+7.6%)
Total 53.171s 36.933s -16.238s (-30.5%)

Memory:

main branch Delta
Max RSS 3,909 MB 4,437 MB +528 MB (+13.5%)

Resolution is 39% faster at the cost of 13.5% more memory from five reverse indices (declaration_to_names, name_to_references, name_dependents, name_to_definitions, pending_work).

@st0012 st0012 requested a review from a team as a code owner February 19, 2026 23:27
@st0012 st0012 self-assigned this Feb 19, 2026
@st0012 st0012 force-pushed the add-incremental-invalidation branch from 365da40 to 774d476 Compare February 19, 2026 23:34
Copy link
Member

@vinistock vinistock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to reason about the algorithm, so not done reviewing yet.

@st0012 st0012 force-pushed the add-incremental-invalidation branch from 7c0e491 to e8cdc32 Compare February 25, 2026 15:54
Copy link
Member

@vinistock vinistock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is highly complex. I think the mechanism works, but it's hard to ship so many changes in one go. Here's what I think we should do to minimize the changes and ship with confidence:

  • Move the rename of resolve_all to resolve to a separate PR
  • Create a PR implementing unresolve_name and unresolve_reference that can be shipped separate from the algorithm
  • Add the name_dependents hashmap of IdentityHashMap<NameId, IdentityHashSet<NameDependent>> using a NameDependent enum. Populate this map during indexing when creating names, so that the global graph only has to merge the work at the end

With this foundation, it will be significantly easier for reviewers to focus on the algorithm. What do you think?

Comment on lines +463 to +468
if let Some(name_set) = self.declaration_to_names.get_mut(&declaration_id) {
name_set.remove(&name_id);
if name_set.is_empty() {
self.declaration_to_names.remove(&declaration_id);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the reasons why I'm not a fan of the multiple hashmaps. We need to ensure that the data on the auxiliary maps is also consistent and the benefit is just avoiding tracing the graph from declaration -> definitions -> names. I'm not convinced tbh.

Also, to make sure we're making progress, I think you can probably ship a separate PR with just the unresolve_name and unresolve_reference methods separately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll address all the feedback first and then see how we can split this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First one: #627

Comment on lines +75 to +81
/// Reverse index: for each `NameId`, which definitions and constant references use it.
/// Eliminates O(D+R) scans during invalidation.
name_users: IdentityHashMap<NameId, Vec<NameUser>>,

/// Reverse index: for each `NameId`, which other names depend on it
/// (via nesting or `parent_scope`). Used for cascade invalidation.
name_dependents: IdentityHashMap<NameId, IdentityHashSet<NameId>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these two maps? The idea of having an enum for name dependents is so that you can go NameId -> ReferenceId | DefinitionId -> NameId.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged the maps but I think the value should also include NameId. So it'd be NameId -> ReferenceId | DefinitionId | NameId.

I prototyped using reference/definition to look up name but it doesn't work well with parent_scope/nesting cases. For example:

class Baz
  include Foo
  CONST
end

The reference CONST creates a name with nesting=baz_name, but it doesn't create a member declaration under Baz. So Baz.members() is empty — there's no path from baz_name to the reference's name through the declaration/definitions. When Baz's ancestors change (e.g., a new prepend Bar is added), we need to re-evaluate CONST, but without the explicit Name(NameId) entry under baz_name, we can't discover it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no path from baz_name to the reference's name through the declaration/definitions

This is the only reason why we need name_dependents. In your example, this is what I would expect the hashmap to look like:

# name_dependents

{
  NameId(Baz) => Set[ReferenceId(Foo), ReferenceId(CONST)]
}

This allows the graph to remember which references and definitions will be potentially impacted by a name change. We then trace name_dependents and the rest of the graph to unresolve names.

In this case, if we had to unresolve all names due to a change to Baz, I would expect the algorithm to do something like this:

  1. Baz changed. Loop through name dependents
  2. Regardless whether the dependent is a reference or definition, get its name_id, pull the name from the graph and unresolve it
  3. Now, unresolving the definition and reference may invalidate other things. Go back to 1. and invalidate the name_id for the reference/definition

This also involves invalidating ancestors, but you get the idea.


/// Accumulated work items from update/delete operations.
/// Drained by `take_pending_work()` before resolution.
pending_work: Vec<Unit>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need an answer for the ever growing memory if users don't call resolve.

st0012 added 4 commits March 2, 2026 21:56
Consolidate two reverse index maps (name_users and name_dependents) into
a single name_dependents map using a unified NameDependent enum with
Definition, Reference, and Name variants. This reduces the Graph struct
from 3 auxiliary maps to 2 and eliminates a class of consistency bugs
where two maps keyed on NameId had to be cleaned up in sync.

Addresses PR review feedback about multiple hashmaps requiring
consistency maintenance.
@st0012 st0012 force-pushed the add-incremental-invalidation branch from a421b3c to 9ee49c6 Compare March 2, 2026 22:04
@st0012 st0012 changed the base branch from main to unresolve-primitives March 2, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants