[core][GPU][StaticMatrix] Introduce GPU backend for NuMojo! #276

shivasankarka · 2025-10-26T10:47:13Z

This PR introduces initial GPU support for Numojo #273

It adds unified device and storage abstractions, a basic matrix representation for GPU computations, and several core GPU kernels (elementwise add/sub/mul, matmul, fill, and a block-level reduction). This work lays the foundation for using Mojo GPU features to accelerate array operations.

The design is inspired by PyTorch Tensor while keeping NumPy-like API choices where possible.

Notes

The StaticMatrix is still a very basic structure with only some getter and setter functions to showcase the proof of concept of a GPU backend in NuMojo. We will expand in future to include all features from Matrix type.
It's named as StaticMatrix as a compile time shape and strides would help optimize a lot of the loops and gpu kernels! This would be a Matrix type that takes advantage of Mojo's compile time capabilities as much as possible! We will modify the API to support compile time optimisations in future updates.

What’s Included

Device & context abstraction

numojo/core/gpu/device.mojo — device and context primitives to target GPU

Unified storage

numojo/core/gpu/storage.mojo — unified CPU/GPU memory management for buffers

Matrix primitives

numojo/core/staticmatrix.mojo — adds a StaticMatrix struct to prototype GPU usage before extending to N-D arrays

GPU kernels

numojo/core/gpu/matrix_kernels.mojo — implements:
- Vectorized elementwise kernels: add, mul, fill (and sub)
- Tiled matmul helpers
- matrix_reduce_sum_kernel (per-block reduction)

Other

Launch-parameter helpers and dtype-specialization hooks for future optimizations
Updated for latest Mojo nightly (pixi updates)
Small dtype fixes and deprecation error fixes

Example

fn main() raises:
    alias SIZE: Int = 1024
    alias cpu: Device = Device.CPU
    alias mps: Device = Device.MPS

    var arr_cpu_1 = StaticMatrix[DType.float32](shape=(SIZE, SIZE), order="C", fill_value=1.0)
    var arr_cpu_2 = StaticMatrix[DType.float32]((SIZE, SIZE), fill_value=2.0)
    var matmul_cpu = arr_cpu_1 @ arr_cpu_2
    print(matmul_cpu)

    var arr_gpu_1 = StaticMatrix[DType.float32, device=mps](shape=(SIZE, SIZE), order="C", fill_value=1.0)
    var arr_gpu_2 = StaticMatrix[DType.float32, device=mps](shape=(SIZE, SIZE), order="C", fill_value=2.0)
    var matmul_gpu = arr_gpu_1 @ arr_gpu_2
    print(matmul_gpu)

    var arr_gpu_fromcpu_1 = arr_cpu_1.to[mps]()
    var arr_gpu_fromcpu_2 = arr_cpu_2.to[mps]()
    var matmul_gpu_fromcpu = arr_gpu_fromcpu_1 @ arr_gpu_fromcpu_2
    print(matmul_gpu_fromcpu)

…erics-and-Algorithms-group#251) As title.

…and-Algorithms-group#275) ## Pull Request Overview (From Copilot) This PR enhances ComplexNDArray functionality by adding comparison operators, trait methods, statistical/reduction methods, and array manipulation capabilities. It also introduces temporary Int conversions for strides/shape operations and implements SIMD load/store methods for vectorized calculations. ### Key Changes - Added trait implementations (ImplicitlyCopyable, Movable) and conversion methods (__bool__, __int__, __float__) for ComplexNDArray - Implemented magnitude-based comparison operators (__lt__, __le__, __gt__, __ge__) for complex arrays - Added statistical methods (all, any, sum, prod, mean, max, min, argmax, argmin, cumsum, cumprod) and array manipulation methods (flatten, fill, row, col, clip, round, T, diagonal, trace, tolist, resize) - Changed internal buffer types from `UnsafePointer[Int]` to `UnsafePointer[Scalar[DType.int]]` in NDArrayShape, NDArrayStrides, and Item structs - Added SIMD load/store methods (load, store, unsafe_load, unsafe_store) for Item, Shape, and Strides <details> <summary>Show a summary per file</summary> | File | Description | | ---- | ----------- | | numojo/routines/indexing.mojo | Added Int conversions for stride operations in compress function | | numojo/routines/creation.mojo | Removed duplicate import statements | | numojo/core/ndstrides.mojo | Changed buffer type to Scalar[DType.int], updated __setitem__ validation, added SIMD load/store methods | | numojo/core/ndshape.mojo | Changed buffer type to Scalar[DType.int], updated __setitem__ validation, added SIMD load/store methods, modified size_of_array calculation | | numojo/core/ndarray.mojo | Added Int conversions for stride/shape buffer accesses throughout | | numojo/core/item.mojo | Changed buffer type to Scalar[DType.int], removed Item.__init__(idx, shape) constructor and offset() method, added SIMD load/store methods | | numojo/core/complex/complex_simd.mojo | Added ImplicitlyCopyable and Movable traits to ComplexSIMD | | numojo/core/complex/complex_ndarray.mojo | Added comparison operators, conversion methods, power operations, statistical methods, and array manipulation methods; added Int conversions for stride operations | </details> --------- Co-authored-by: ZHU Yuhao 朱宇浩 <dr.yuhao.zhu@outlook.com>

DType.index errors.

formatting errors.

shivasankarka and others added 30 commits March 10, 2025 18:04

removed some typos

e9b6db3

fix typos in example

d589b50

Merge remote-tracking branch 'upstream/pre-0.7'

4f11dc3

[release] Merge pre-0.7 into main for the release of v0.7.0 (Mojo-Num…

2b72ef4

…erics-and-Algorithms-group#251) As title.

Merge remote-tracking branch 'upstream/main'

25c7796

updated to Mojo 25.4

3c3d5a3

update to Mojo 25.4

abd4fe8

update dependancies

6d46d59

fix format

533aa02

fix tests

a08b1a0

update to pixi

d52dbb7

fix github workflow

1e1f29a

fix github workflow

74d9a92

fix workflow

5fcf486

hopefully this fix works

fe07a23

fix ndarry formatting issue

183e4fb

fix formatting workflow

8b51f18

please work - formatter

827e855

fix format workflow

d92a7e4

Fix pre-commit issues

5bdca62

Update workflow

4c3ee57

Update workflow

bfc04a8

add load and save functions to io routines; update imports accordingly

e00ba89

Merge remote-tracking branch 'upstream/pre-0.8' into prev0.8

af116e9

added error types

a959745

updated file io methods

0a265ca

resolved name clashes.

03906cb

fix format

cbb8be9

fixed io errors

f8cf4d2

fix implicity conformance

b6099b7

shivasankarka and others added 23 commits October 21, 2025 23:29

add comparision methods to complex array

9c43eb6

Update complex_ndarray.mojo

10d3208

add statistical method to complex array

223d516

add more manipulation methods to complex array

ca88498

Update complex_ndarray.mojo

0cdf8dc

Update complex_ndarray.mojo

cf0b7e4

Merge remote-tracking branch 'upstream/pre-0.8' into unify_containers

3d0d705

fix merging errors

8a1b7b1

Fix docstring in creation module and move some funcs to compile time.

fcc5737

Merge remote-tracking branch 'upstream/pre-0.8' into prev0.8

e2755ee

update pixi to use workspace

8f6a0d1

Merge remote-tracking branch 'upstream/pre-0.8' into gpu_ndarray

ccee119

Merge branch 'prev0.8' into gpu_ndarray

678d998

update pixi to nightly to make barrier() work with apple gpu properly.

43dd1d5

update numojo to work with latest nightly by fixing memcpy and other

2cbdf76

DType.index errors.

temporarily fix the where function with `` as it was causing

482927f

formatting errors.

Create device.mojo

b7894d4

create storage to handle cpu and gpu memory in unified manner.

c681ec4

create staticmatrix with some basic method

517e829

implement basic add, sub, mul, matmul kernels

adbc971

update some comments and errors, add __sub__

7d2f0aa

fix formatting error due to where function.

1be8756

shivasankarka marked this pull request as draft October 26, 2025 11:11

shivasankarka added 2 commits October 26, 2025 20:29

fix errors in tests due to wrong conversion from matrix to numpy arrays

20dc4e6

Update utility.mojo

6967f9d

shivasankarka requested review from a team, MadAlex1997 and forfudan October 26, 2025 11:30

shivasankarka mentioned this pull request Oct 31, 2025

[PROPOSAL] Creation of NDArray, Matrix views #279

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core][GPU][StaticMatrix] Introduce GPU backend for NuMojo! #276

[core][GPU][StaticMatrix] Introduce GPU backend for NuMojo! #276

Uh oh!

shivasankarka commented Oct 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[core][GPU][StaticMatrix] Introduce GPU backend for NuMojo! #276

Are you sure you want to change the base?

[core][GPU][StaticMatrix] Introduce GPU backend for NuMojo! #276

Uh oh!

Conversation

shivasankarka commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

What’s Included

Device & context abstraction

Unified storage

Matrix primitives

GPU kernels

Other

Example

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shivasankarka commented Oct 26, 2025 •

edited

Loading