Prototype: Name-based invalidation for incremental updates#571
Draft
thomasmarshall wants to merge 12 commits intomainfrom
Draft
Prototype: Name-based invalidation for incremental updates#571thomasmarshall wants to merge 12 commits intomainfrom
thomasmarshall wants to merge 12 commits intomainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR prototypes incremental updates via name-based invalidation.
The basic premise is that when a document is added/changed/removed we invalidate the graph for based on all names in either the old or new version of the document. We use those seed names to find dependent names and remove/invalidate everything that could possibly require re-resolution or re-linearization.
Tracking dependent names
We are already able to traverse names upwards via nesting and parent names, but invalidation requires traversing downwards too. During indexing we can add names as dependent of one-another via a
dependentsfield on theName.This document has 5 names:
Foowith no parent or nestingBarwithFooparent and no nestingBazwith no parent andBarnestingQuxwith no parent andBarnestingABCwith no parent andBaznestingWe would add the following dependencies:
Foo→[Bar]Bar→[Baz, Qux]Baz→[ABC]Qux→[]ABC→[]Currently, this prototype tracks a dependency for both parent and nesting names, but I'm not sure if that's necessary. We might only need to track nesting name because we can find resolved parent/child names via the declaration members. For example, we shouldn't need to track that
Baris a dependent ofFoobecause there will be aBarmember declaration in theFoodeclaration, and we can get all the names that point to it from there.I also experimented with tracking aliases:
Here we'd add
Aas a dependent ofBandBas a dependent ofA. Again, we already model the relationship in one direction via alias targets but tracking both directions here made the prototype straightforward. I wonder if target-to-aliases dependency should be tracked on the declaration, but again, this was easy.Some form of relationship tracking is necessary here, so we can propagate invalidation in either direction when necessary.
Invalidation
The basic algorithm for invalidating based on names is as follows:
This is probably quite simplistic and missing nuance. The prototype undoubtedly has bugs, even in this simplified invalidation mechanism. However, it works for simple cases and even when testing switching branches on some smaller projects.
It definitely over invalidates, but the idea is that it should invalidate less than the full graph and we can incrementally work towards a solution that invalidates less and less.
Re-resolution
The collection definition IDs and reference IDs are handed back to the resolve via the incremental updates infrastructure from #559. This list of IDs should theoretically include everything connected to an invalidated name plus the new ones. The declarations have been deleted, so the resolver will recreate them and re-resolve any unresolved names for these entities.
## Full vs incremental updates
The prototype currently has a flag to skip invalidation for the initial build. We start
resolved: falseand then after the initial resolution phase we setresolved: true. This is because the invalidation logic incurs a cost that we shouldn't need to pay first time.Verification
The prototype includes a small tool set for verifying that incremental updates are working as expected:
assert_incremental_graph_intrgrity!macro performs incremental resolution and then generates a fresh graph from the same docs. It uses thedifftool from Add mechanism for diffing graphs #557 to compare the state of the graph—references, definitions, declarations, members, ancestry, names, resolution etc.incremental_verifyexample allows us to perform the same kind of graph comparison using two references from a git project. It checks out the first reference, builds the initial graph, then switches to the second reference to perform an incremental build, then it performs a fresh full build, and finally compares the state of the incremental and fresh graphs.For example:
This one looks fine, but on a bigger and more complex project there are still many difference in the graph.
Remaining challenges
There are many edge cases (and not-so-edge cases) that are not yet covered. For an example of a not-so-edge case: this prototype doesn't resolve previously resolved names that should now be.
For example:
FooThis reference should be unresolved. If we add a second file:
…the reference should become resolved. It doesn't at the moment, because we don't have a mapping of unresolved names to references—we can only get to them via declarations (resolved) or searching through the whole of
graph.constant_references.There are of course also many nuances to how singletons and aliases should be handled, and the prototype is, in some cases, unable to linearize complete ancestors that the fresh graph handles just fine. I wasn't yet able to track down the cause.
We also need to think about performance due to over-invalidation. This approach blows away everything that could possibly be affected and then builds it up again. Depending on what changes, that could be a sizable chunk of the graph. Invalidation is not free, and so there will be a point at which it's quicker to just short-cut to a full rebuild.
Another challenge will be how to index the necessary data such that we can efficiently traverse the graph for things—without massively inflating the size of the data.
I think there are some small flow/architectural things to consider here too. For one, it seems inefficient to perform invalidation on a per-document basis as we are currently doing—especially since many files will have overlapping names. Of course, once the names have been unresolved and declarations removed there will be less work to do each time, but still I think it would be preferable to collect the full set of names from all documents and process them in a single queue.
We could still implement a fast/slow path decision for incremental updates (like this) on top of this approach. If 90% of key strokes don't change the set/shape of namespaces in the document then we can take a shortcut to get very fast incremental updates.
Notes
Included in this PR are a handful of temporary commits that are unrelated to the actual prototype: