@@ -2043,6 +2043,177 @@ parts::
20432043 return %1 : $Klass
20442044 }
20452045
2046+ Borrowed Object based Safe Interior Pointers
2047+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2048+
2049+ What is an "Unsafe Interior Pointer"
2050+ ````````````````````````````````````
2051+
2052+ An unsafe interior pointer is a bare pointer into the innards of an object. A
2053+ simple example of this in C++ would be using the method std::vector: :data() to
2054+ get to the innards of a std::vector. In general interior pointers are unsafe to
2055+ use since languages do not provide any guarantees that the interior pointer will
2056+ not be used after the underlying object has been deallocated. To see this,
2057+ consider the following C++ example::
2058+
2059+ int unfortunateFunction() {
2060+ int *unsafeInteriorPointer = nullptr;
2061+ {
2062+ std::vector<int> vector;
2063+ vector.push_back(5);
2064+ unsafeInteriorPointer = vector.data();
2065+ printf("%d\n", *unsafeInteriorPointer); // Prints "5".
2066+ } // vector deallocated here
2067+ return *unsafeInteriorPointer; // Kaboom
2068+ }
2069+
2070+ In words, C++ allows for us to get the interior pointer into the vector, but
2071+ then lets us do whatever we want with the pointer, including use it after the
2072+ underlying memory has been invalidated.
2073+
2074+ From a user's perspective, interior pointers are really useful since one can use
2075+ it to pass data to other APIs that are only expecting a pointer and also since
2076+ one can use it to sometimes get better performance. But from a language designer
2077+ perspective, this sort of API verboten and leads to bugs, crashes, and security
2078+ vulnerabilities. That being said, clearly users have a need for such
2079+ functionality, so we, as language designers, should figure out manners to
2080+ express these sorts of patterns in our various languages in a safe way that
2081+ prevents user’s from foot-gunning themselves. In SIL, we have solved this
2082+ problem via the direct modeling of interior pointer instructions as a high level
2083+ concept in our IR.
2084+
2085+ Safe Interior Pointers in SIL
2086+ `````````````````````````````
2087+
2088+ In contrast to LLVM-IR, SIL provides mechanisms that language designers can use
2089+ to express concepts like the above in a manner that allows the language to
2090+ define away compiler generated unsafe interior pointer usage using "Safe
2091+ Interior Pointers". This is implemented in SIL by:
2092+
2093+ 1. Classifying a set of instructions as being "interior pointer" instructions.
2094+ 2. Enforcing in the SILVerifier that all "interior pointer" instructions can
2095+ only have operands with `Guaranteed `_ ownership.
2096+ 3. Enforcing in the SILVerifier that any transitive address use of the interior
2097+ pointer to be a liveness requirement of the "interior pointer"'s
2098+ operand.
2099+
2100+ Note that the transitive address use verifier from (3) does not attempt to
2101+ classify uses directly. Instead the verifier:
2102+
2103+ 1. Has an explicit list of instructions that it understands as requiring
2104+ liveness of the base object.
2105+
2106+ 2. Has a second list of instructions that require liveness and produce a address
2107+ whose transitive uses need to be recursively processed.
2108+
2109+ 3. Asserts on any instructions that are not known to the verifier. This ensures
2110+ that the verifier is kept up to date with new instructions.
2111+
2112+ Note that typically instructions in category (1) are instructions whose uses do
2113+ not propagate the pointer value, so they are safe. In contrast, some other
2114+ instructions in category (1) are escaping uses of the address such as
2115+ `pointer_to_address `_. Those uses are unsafe--the user is reponsible for
2116+ managing unsafe pointer lifetimes and the compiler must not extend those pointer
2117+ lifetimes.
2118+
2119+ These rules ensure statically that any uses of the address that are not escaped
2120+ explicitly by an instruction like `pointer_to_address `_ are within the
2121+ guaranteed pointers scope where the guaranteed value is statically known to be
2122+ live. As a result, in SIL it is impossible to express such a bug in compiler
2123+ generated code. As an example, consider the following unsafe interior pointer
2124+ SIL::
2125+
2126+ class Klass { var k: KlassField }
2127+ struct KlassWrapper { var k: Klass }
2128+
2129+ // ...
2130+
2131+ // Today SIL restricts interior pointer instructions to only have operands
2132+ // with guaranteed ownership.
2133+ %1 = begin_borrow %0 : $Klass
2134+
2135+ // %2 is an interior pointer into %1. Since %2 is an address, it's uses are
2136+ // not treated as uses of underlying borrowed object %1 in the ownership
2137+ // system. This is because at the ownership level objects with None
2138+ // ownership are not verified and do not have any constraints on how they
2139+ // are used from the ownership system.
2140+ //
2141+ // Instead the ownership verifier gathers up all such uses and treats them
2142+ // as uses of the object from which the interior pointer was projected from
2143+ // transitively. This means that this is a constraint on the guaranteed
2144+ // objects use, not on the trivial values.
2145+ %2 = ref_element_addr %1 : $Klass, #Klass.k // %2 is a $*KlassWrapper
2146+ %3 = struct_element_addr %2 : $*KlassWrapper, #KlassWrapper.k // %3 is a $*Klass
2147+
2148+ // So if we end the borrow %1 at this point, invalidating the addresses
2149+ // ``%2`` and ``%3``.
2150+ end_borrow %1 : $Klass
2151+
2152+ // We would here be loading from an invalidated address. This would cause a
2153+ // verifier error since %3's use here is a regular use that is inferred up
2154+ // on %1.
2155+ %4 = load [copy] %3 : $*KlassWrapper
2156+
2157+ // ...
2158+
2159+ Notice how due to a possible bug in the compiler, we are loading from
2160+ potentially uninitialized memory ``%4 ``. This would have caused a verifier error
2161+ stating that ``%4 `` was an interior pointer based use-after-free of ``%1 ``
2162+ implying this is mal-formed SIL.
2163+
2164+ NOTE: This is a constraint on the base object, not on the addresses themselves
2165+ which are viewed as outside of the ownership system since they have `None `_
2166+ ownership.
2167+
2168+ In contrast to the previous example, the following example follows ownership
2169+ invariants and is valid SIL::
2170+
2171+ class Klass { var k: KlassField }
2172+ struct KlassWrapper { var k: Klass }
2173+
2174+ // ...
2175+
2176+ %1 = begin_borrow %0 : $Klass
2177+ // %2 is an interior pointer into the Klass k. Since %2 is an address and
2178+ // addresses have None ownership, it's uses are not treated as uses of the
2179+ // underlying object %1.
2180+ %2 = ref_element_addr %1 : $Klass, #Klass.k // %2 is a $*KlassWrapper
2181+
2182+ // Destroying %1 at this location would result in a verifier error since
2183+ // %2's uses are considered to be uses of %1.
2184+ //
2185+ // end_lifetime %1 : $Klass
2186+
2187+ // We are statically not loading from an invalidated address here since we
2188+ // are within the lifetime of ``%1``.
2189+ %3 = struct_element_addr %2 : $*KlassWrapper, #KlassWrapper.k
2190+ %4 = load [copy] %3 : $*Klass // %1 must be live here transitively
2191+
2192+ // ``%1``'s lifetime ends. Importantly we know that within the lifetime of
2193+ // ``%1``, ``%0``'s lifetime can not shrink past this point, implying
2194+ // transitive static safety.
2195+ end_borrow %1 : $Klass
2196+
2197+ In the second example, we show a well-formed SIL program showing off SIL's Safe
2198+ Interior Pointers. All of the uses of ``%2 ``, the interior pointer, are
2199+ transitively uses of the base underlying object, ``%0 ``.
2200+
2201+ The current list of interior pointer SIL instructions are:
2202+
2203+ * `project_box `_ - projects a pointer out of a reference counted box. (*)
2204+ * `ref_element_addr `_ - projects a field out of a reference counted class.
2205+ * `ref_tail_addr `_ - projects out a pointer to a class’s tail allocated array
2206+ memory (assuming the class was initialized to have such an array).
2207+ * `open_existential_box `_ - projects the address of the value out of a boxed
2208+ existential container using the current function context/protocol conformance
2209+ to create an "opened archetype".
2210+ * `project_existential_box `_ - projects a pointer to the value inside a boxed
2211+ existential container. Must be the type for which the box was initially
2212+ allocated for and not for an "opened" archetype.
2213+
2214+ (*) We still need to finish adding support for project_box, but all other
2215+ interior pointers are guarded already.
2216+
20462217Runtime Failure
20472218---------------
20482219
0 commit comments