VNT: Handling arrays generally

I think one possible general solution to this is going to be something that's similar to what ForwardDiff does (at least for `AbstractArray`). Technically, it's even closer to [a sparsity tracer](https://adrianhill.de/SparseConnectivityTracer.jl/stable/internals/how_it_works/).

When `x` is initialised to an `OffsetArray{T}`, we want the shadow of `x` (let's call it `dx`) to be initialised to the same `OffsetArray{Dual{T}}`. Each of these 'duals' needs to carry the value, along with the boolean mask indicating whether it was set or not. This is of course exactly the same as how `PartialArray` has one array of values and one array of masks.

```julia
struct Dual{T}
   val::T
   has_been_set::Bool
end
Dual(val) = Dual(val, false)
```

Then when `x[idxs...]` is set to whatever `rand(Normal())` returns, we set `dx[idxs...] = Dual(x[idxs...], true)`. Of course, the block elements will need the same workaround as before (complete with all the `typejoin` shenanigans).

```julia
struct BlockDual{T}
    val::T
    original_indices::whatever
end
```

Then, reading from the shadow memory becomes way more consistent, because indexing into `dx` carries the same semantics as indexing into `x`.

For another example, let's say that `x` and `dx` are regular Base.Matrix. Then when you index into `dx[1, 1]`, it will be the same as indexing into `dx[1]`. You could even make `dx[1] = ...` error when you try to modify a value that was already set. For `BlockDual`s, you could further 'resolve' or 'normalise' the indices into a standard, 1-based scheme because you now know the type of the container (I think `Base.to_indices` does this, but not sure; in any case this should be easy to figure out). That allows you to use different indexing schemes to read the data, as long as they refer to the same elements.

A scheme like this would also enable 'native support' for not only `OffsetArray`, but also things like `DimensionalData.DimArray` which probably has [way more legitimate uses than `OffsetArray`](https://github.com/sethaxen/DimensionalDistributions.jl).

This looks like a lot of work for VNT to do, but we are already duplicating the entire array in the VNT*, so in principle I don't think performance should be any worse. Indexing into this would still be type stable (in the absence of blocks, but that's the same as currently) and you would still be able to do all the checks you need on whether the value was set or not.

In particular, I think that this should be fairly easy to implement for all `AbstractArray`s. (In fact, we know for a fact that it *must* be possible, because other packages do this successfully.) Extending this to dictionaries, or generally any type that supports `getindex`, would be annoying (for every new type you implement, you have to define the equivalent of a 'tangent type'†); it's doable, but it could be left for a later stage. I think the `AbstractArray` support would already solve 99% of pain points with this, **and** the path forwards to supporting other types would be clear, **and** when some type isn't supported, you can error with a very clear error message saying that "indexing into this type on LHS of tilde is not supported", whereas now our battle lines are quite arbitrarily drawn.

----

\* OK, there is one case where this will do a lot more work than VNT currently: if the user has a very large array, like a length-100 vector, but only `x[1] ~ ...` and never the other elements. I don't really care about optimising performance for such cases. It is impossible to detect these without static analysis (and note that AD has a similar characteristic: if you differentiate `f(x) = x[1]` and pass in a length-100 vector, the gradient calculation will be equally unoptimised).

† Notice that the way VNT normalises all property access into NamedTuples is **exactly** what Mooncake's `tangent_type` does for structs.

_Originally posted by @penelopeysm in https://github.com/TuringLang/DynamicPPL.jl/pull/1181#discussion_r2637268388_
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VNT: Handling arrays generally #1194

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VNT: Handling arrays generally #1194

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions