Skip to content

Remove narr[mask] += 1 syntax #34

@shinzlet

Description

@shinzlet

Quoting from the comments for MultiIndexable#[](bool_mask : MultiIndexable(Bool)) : self

This method allows operations of the form narr[mask] += 5 - it's listed here to minimize confusion, but you should never use it.

In Crystal, expressions of the form #{op}= are not actually overridable
functions - instead, the complier expands them as such:

a += b # => expands to a = a + b

In Phase, this is generally fine, as all meaningful operators have been implemented:

narr += 1 # => narr = narr + 1

However, when boolean masks are involved, this syntax is problematic.
Consider the following example, which does not involve much complication:

narr = NArray[3, 4, 5]
mask = NArray[false, true, true]
narr[mask] = 1
puts narr # => NArray[3, 1, 1] (every location where mask was true got assigned, but other locations were not changed)

And now the final example, which shows the issue:

narr = NArray[3, 4, 5]
mask = NArray[false, true, true]
narr[mask] += 1

It is perfectly clear what narr ought to be, now - the spots where the mask
is true should get one added to them. But look at what the compiler expands that
last line to:

narr[mask] += 1 # => narr[mask] = narr[mask] + 1

So, we have a problem - until now, NArray only needed to implement the setter
MultiWritable#[]=(bool_mask : MultiIndexable(Bool), value) - but now,
it must also implement the getter #[](bool_mask : MultiIndexable(Bool)).

Because #[](bool_mask) does not have a particularly meaningful definition,
we opted to simply make it return self, which allows our problematic
example to work:

narr[mask] += 1
  # Crystal expands this to:
  # narr[mask] = narr[mask] + 1
  # MultiIndexable#[](bool_mask) just returns self:
  # narr[mask] = self + 1

As you can see, this fixes the problem. However, it is not very efficient,
because it creates a full copy of self and adds one to every element.

To recap - this method is only used internally, and you should never need
to use it explicitly.

At the time, we implemented this shim because we wanted to support the narr[mask] += 1 syntax, in the way that numpy does. However, the purpose of Phase is to try and make a more principled, intuitive model of scientific computing. I think that the syntax described is counterintuitive - we're literally trying to hack it in - and needlessly terse. Additionally, finding a performant solution to this problem (right now we do a full clone and map) would be very hacky - we'd have to implement some sort of proxy class that has its own algebraic operators.

So, I think that we might want to just stick to simpler solutions - for example, a method like def map_where(mask : MultiIndexable(Bool), &block) could be used. We could also just have the user implement this:

mask.each_with_coord do |bit, coord|
  target[coord] += 1 if bit
end

The above would be effectively impossible for python, but Crystal is fast enough.

I think I'm more in favor of adding a method to do it, though - it is a relatively common optimization, and NArray would want to override it with a buffered implementation for significant perf gains.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions