Prototype: Name-based invalidation for incremental updates by thomasmarshall · Pull Request #571 · Shopify/rubydex

thomasmarshall · 2026-02-06T18:33:56Z

This PR prototypes incremental updates via name-based invalidation.

The basic premise is that when a document is added/changed/removed we invalidate the graph for based on all names in either the old or new version of the document. We use those seed names to find dependent names and remove/invalidate everything that could possibly require re-resolution or re-linearization.

Tracking dependent names

We are already able to traverse names upwards via nesting and parent names, but invalidation requires traversing downwards too. During indexing we can add names as dependent of one-another via a dependents field on the Name.

module Foo::Bar
  class Baz < Qux
    ABC = 1
  end
end

This document has 5 names:

Foo with no parent or nesting
Bar with Foo parent and no nesting
Baz with no parent and Bar nesting
Qux with no parent and Bar nesting
ABC with no parent and Baz nesting

We would add the following dependencies:

Foo → [Bar]
Bar → [Baz, Qux]
Baz → [ABC]
Qux → []
ABC → []

Currently, this prototype tracks a dependency for both parent and nesting names, but I'm not sure if that's necessary. We might only need to track nesting name because we can find resolved parent/child names via the declaration members. For example, we shouldn't need to track that Bar is a dependent of Foo because there will be a Bar member declaration in the Foo declaration, and we can get all the names that point to it from there.

I also experimented with tracking aliases:

A = B

Here we'd add A as a dependent of B and B as a dependent of A. Again, we already model the relationship in one direction via alias targets but tracking both directions here made the prototype straightforward. I wonder if target-to-aliases dependency should be tracked on the declaration, but again, this was easy.

Some form of relationship tracking is necessary here, so we can propagate invalidation in either direction when necessary.

Invalidation

The basic algorithm for invalidating based on names is as follows:

Enqueue all names in both old/new version of document and all dependent names
For each name in the queue:
- Find the declaration it resolves to
- Enqueue all names for its definitions and references
- Collect IDs for its definitions and references
- Clean up the declaration
- Do the same for each of its descendants
- Unresolve the name
Return the collected IDs for definitions and references merged with any IDs from the new doc

This is probably quite simplistic and missing nuance. The prototype undoubtedly has bugs, even in this simplified invalidation mechanism. However, it works for simple cases and even when testing switching branches on some smaller projects.

It definitely over invalidates, but the idea is that it should invalidate less than the full graph and we can incrementally work towards a solution that invalidates less and less.

Re-resolution

The collection definition IDs and reference IDs are handed back to the resolve via the incremental updates infrastructure from #559. This list of IDs should theoretically include everything connected to an invalidated name plus the new ones. The declarations have been deleted, so the resolver will recreate them and re-resolve any unresolved names for these entities.

## Full vs incremental updates

The prototype currently has a flag to skip invalidation for the initial build. We start resolved: false and then after the initial resolution phase we set resolved: true. This is because the invalidation logic incurs a cost that we shouldn't need to pay first time.

Verification

The prototype includes a small tool set for verifying that incremental updates are working as expected:

The assert_incremental_graph_intrgrity! macro performs incremental resolution and then generates a fresh graph from the same docs. It uses the diff tool from Add mechanism for diffing graphs #557 to compare the state of the graph—references, definitions, declarations, members, ancestry, names, resolution etc.
The incremental_verify example allows us to perform the same kind of graph comparison using two references from a git project. It checks out the first reference, builds the initial graph, then switches to the second reference to perform an incremental build, then it performs a fresh full build, and finally compares the state of the incremental and fresh graphs.

For example:

$ cargo run --example incremental_verify --release -- /Users/thomasmarshall/src/github.com/Shopify/tapioca main main~1

Checking out main...
Building base graph...
  1837 declarations, 2252 definitions, 5127 references (0.02s)

Checking out main~1... (19 modified, 0 deleted)
  2465 definitions and 3569 references scheduled for resolution
  1837 declarations, 2252 definitions, 5127 references (0.01s)

Building reference graph...
  1837 declarations, 2252 definitions, 5127 references (0.03s)

Restoring 033d95194c4e...

Comparing incremental vs reference...
OK: graphs are identical

Incremental update was 2.2x faster than full rebuild

This one looks fine, but on a bigger and more complex project there are still many difference in the graph.

Remaining challenges

There are many edge cases (and not-so-edge cases) that are not yet covered. For an example of a not-so-edge case: this prototype doesn't resolve previously resolved names that should now be.

For example:

Foo

This reference should be unresolved. If we add a second file:

class Foo; end

…the reference should become resolved. It doesn't at the moment, because we don't have a mapping of unresolved names to references—we can only get to them via declarations (resolved) or searching through the whole of graph.constant_references.

There are of course also many nuances to how singletons and aliases should be handled, and the prototype is, in some cases, unable to linearize complete ancestors that the fresh graph handles just fine. I wasn't yet able to track down the cause.

We also need to think about performance due to over-invalidation. This approach blows away everything that could possibly be affected and then builds it up again. Depending on what changes, that could be a sizable chunk of the graph. Invalidation is not free, and so there will be a point at which it's quicker to just short-cut to a full rebuild.

Another challenge will be how to index the necessary data such that we can efficiently traverse the graph for things—without massively inflating the size of the data.

I think there are some small flow/architectural things to consider here too. For one, it seems inefficient to perform invalidation on a per-document basis as we are currently doing—especially since many files will have overlapping names. Of course, once the names have been unresolved and declarations removed there will be less work to do each time, but still I think it would be preferable to collect the full set of names from all documents and process them in a single queue.

We could still implement a fast/slow path decision for incremental updates (like this) on top of this approach. If 90% of key strokes don't change the set/shape of namespaces in the document then we can take a shortcut to get very fast incremental updates.

Notes

Included in this PR are a handful of temporary commits that are unrelated to the actual prototype:

Adding the incremental infrastructure from Incremental infrastructure #559
Adding the diff functionality from Add mechanism for diffing graphs #557
A "fix" for constant singletons that was causing noise in the graph diff

thomasmarshall added 11 commits February 6, 2026 17:24

Add incremental infrastructure

caf66e7

Add diff functionality

bd45391

Fix constant singleton references

dfc898b

Add name dependents

3a63398

Include name dependencies in diff

a1b789f

Add name-based invalidation logic

a709b1c

Test name-based invalidation

791989f

Add incremental_verify example

d4dd2ff

Remove duplicate cleanup from remove_definitions_for_uri

10ba1ad

Only invalidate for incremental updates

ae4863b

Add alias name dependencies

e3a29cd

thomasmarshall requested a review from a team as a code owner February 6, 2026 18:33

thomasmarshall marked this pull request as draft February 6, 2026 18:34

thomasmarshall changed the title ~~Prototype: name-based invalidation for incremental updates~~ Prototype: Name-based invalidation for incremental updates Feb 6, 2026

thomasmarshall mentioned this pull request Feb 6, 2026

Support consuming document changes incrementally #330

Open

Add remaining tests from previous prototypes

dcc20df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype: Name-based invalidation for incremental updates#571

Prototype: Name-based invalidation for incremental updates#571
thomasmarshall wants to merge 12 commits intomainfrom
name-based-invalidation

thomasmarshall commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thomasmarshall commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tracking dependent names

Invalidation

Re-resolution

Verification

Remaining challenges

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thomasmarshall commented Feb 6, 2026 •

edited

Loading