Skip to content

Normalization Kernels Conflate tile_size and Batch Size #40

@andrej

Description

@andrej

The layer_norm, RMS norm and softmax designs use the same parameter for the number of elements to normalize over (batch size) and the buffer sizes on the cores (tile size).

We should rename tile_size to matrix_columns, batch_size or something similar to avoid confusion.

Other kernels use tile_size as a parameter that affects only the data movement, not the output. For those other kernels, you can tune tile_size to maximize L1 memory usage while still performing the same calculation. For the normalization kernels, on the other hand, changing tile_size changes the output.

Additionally, we might want to add support for batch_size < tile_size, as this should be relatively simple. (Each kernel call processes N batches and maintains N means, variances, ...). batch_size > tile_size might be harder to implement, as it would require passing means, variances, ... from kernel call to the next kernel call, so we could just error in that case for now.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions