[WIP] analyze: refactor mir_op to explicitly track per-subloc info #1191
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a WIP refactor of
mir_op. I don't have time to finish it at the moment, but I'm posting this PR and including some notes here so it doesn't get lost. Currently it works on trivial examples likeoffset1, but fails on more interesting ones likealgo_md5. It mostly seems to be failing while trying to produce nonsensical casts, but there are also a lot of unimplementedCalleecases in the newmir_opthat will surely cause other problems later on.This branch refactors the MIR rewrite generation pass (
rewrite::expr::mir_op) to separateLTy/TypeDeschandling from the actual generation of casts and other rewrites. It dividesmir_opinto three separate passes: the first collects type and other metadata for each MIR node, the second determines which casts are needed to produce a well-typed program after type rewriting, and the third inserts the casts and any other necessary rewrites.These passes work on a representation called
SublocInfo. A "subloc" or "node" is a piece of MIR at finer granularity than aLocation. For example, given the statement_2 = Use(move _1), aSubLocpath can refer to the whole statement, the destination place_2, the rvalueUse(move_1), the operandmove _1, or the place_1. Each of these can have its ownSublocInfothat describes its type and other information about the surrounding context or how it can be used.The three new passes in more detail:
SublocInfocollection: This pass computes the "new type" of each node, which is the type it would have after the types of all defs and locals are rewritten to match theirLTys. This can produce inconsistent results, such as giving the LHS and RHS of an assignment different types. This pass also records other metadata, such as the access mode (imm or mut) forPlaces.SublocInfotypechecking: This pass checks for inconsistencies and computes the "expected type" of each node, which is the type it should have in order to make it usable in the surrounding context. By default, the node's expected type is identical to its new type, but it may be changed to resolve a type error. For example, in an inconsistent assignment (where the LHS and RHS have different new types), the expected type of the RHS will be set to match the new (and expected) type of the LHS. There are also some cases, mostly around special functions likeoffset, where this pass will adjust a node's new type instead of its expected type.offset. This is similar to the behavior of the existingmir_oppass, but it's driven entirely bySublocInfoentries, rather than directly consultingLTys.Advantages of the new design:
SublocInfos to determine whether it's an issue with the rewrite itself or withSublocInfogeneration.SublocInfocollection phase interacts directly with analysis results (LTys). This means we could implement an alternate version of that pass with a different strategy for determining new types, while reusing all the rest of the rewriting machinery.Limitations:
Calleethey encounter.SublocInfopasses to only request casts that the rewriting pass can handle.offsetto a subslice operation) and which should be left alone. Currently this is handled in a roundabout way: some of the inputs and/or outputs of the function are markedFIXEDin the analysis, so their new types are left as raw pointers, and the rewriting pass knows to skip the normal rewrite if it sees raw pointers there.