-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Labels
Description
I looked into what would entail adding BCSR support to legate.sparse. I'll hold off on putting the actual work into it until we have requests from users that actually want to use the BCSR format and what functions they would like to see implemented. Some notes I have on doing this:
- The BCSR format is just CSR, where each CSR entry is a contiguous block of non-zeros, similar to a DSDD format in TACO.
- As a result, a similar approach to what we've done for CSR/CSC should work out. An annoying thing about this is that the way DISTAL would represent a DSDD tensor is different enough from how we would want to do this effectively in a SciPy implementation that we won't be able to take DISTAL kernels directly, but they should provide a reasonable skeleton for implementation.
- In general, the implementation strategy will need to do a few things for declaring partition alignment using legate sparse. 1) input and output dense tensors will have to be reshaped (using store reshapes) based on the blocksize to have alignments against the blocked pos array. 2) images (on both pos and crd) need to have an affine transform applied to them that scales the values by different components of the blocksize
Shape. Since this can't be done right now using Legion, we'll have to approximate this by making temporary copies of the stores. - Format conversions are unfortunately not implemented in pure python, so we'll have to hand-code some of these format conversion routines.
- cuSparse has pretty well-rounded support for BCSR matrices that we can exploit. It'll be unclear what to do for operations that cuSparse doesn't support, since the size of the blocks significantly affects the strategy for GPU execution (even for CPU-parallel execution).
An example dot implementation on a BSR matrix might look something like:
def dot(self, x):
y = cn.zeros(self.shape[0]).reshape(self.R, -1)
x = x.reshape(self.C, -1)
task = ctx.create_task(BCSR_SPMV)
task.add_output(y)
task.add_input(self.pos, self.crd, self.vals, x)
promoted_pos = self.pos.promote(y.shape[1], 1)
promoted_vals = self.vals.reshape(self.crd.shape[0], -1)
task.add_alignment(promoted_pos, y, 0)
task.add_image(promoted_pos, self.crd)
task.add_image(promoted_pos, promoted_vals)
task.add_image(self.crd, x)
task.execute()
The tricky part here is handling the images onto these transformed stores, which we can't currently do natively in the image. One way of getting around this is to not transform the stores at all, but create temporary transformed version of the regions that take the correct image.