Skip to content

Add as_remote(rank) pointer translation API#495

Draft
mawad-amd wants to merge 3 commits intomainfrom
muhaawad/as-remote
Draft

Add as_remote(rank) pointer translation API#495
mawad-amd wants to merge 3 commits intomainfrom
muhaawad/as-remote

Conversation

@mawad-amd
Copy link
Copy Markdown
Collaborator

@mawad-amd mawad-amd commented Apr 1, 2026

Summary

Just an idea I'm prototyping — not planning to merge this for now.

as_remote gives a clean way to get a pre-translated pointer to another rank's symmetric heap copy:

  • Host-side: ctx.as_remote(tensor, rank) → zero-copy torch.Tensor whose data_ptr() points to the target rank's copy. Same shape/dtype/strides.
  • Device-side (Gluon): ctx.as_remote(ptr, rank) → translated pointer for direct use with gl.load/gl.store.

Useful for hoisting translation out of loops or passing pre-translated pointers to kernels without manual heap base arithmetic.

Host-side example

ctx = iris.iris(heap_size=2**30)
buf = ctx.zeros(1024, dtype=torch.float32)

# Get a view pointing to rank 3's copy of buf
remote_buf = ctx.as_remote(buf, rank=3)

# remote_buf has same shape/dtype/strides, but data_ptr()
# points into rank 3's symmetric heap
print(remote_buf.shape)       # torch.Size([1024])
print(remote_buf.data_ptr())  # address in rank 3's heap

Device-side example (Gluon)

@gluon.jit
def read_from_peer(IrisDeviceCtx: gl.constexpr, context_tensor, buf, out, peer: gl.constexpr, BLOCK: gl.constexpr):
    ctx = IrisDeviceCtx.initialize(context_tensor)
    offs = gl.program_id(0) * BLOCK + gl.arange(0, BLOCK)
    mask = offs < BLOCK

    # Translate pointer once, then use gl.load/gl.store directly
    remote_ptr = ctx.as_remote(buf + offs, peer)
    data = gl.load(remote_ptr, mask=mask)
    gl.store(out + offs, data, mask=mask)

Test plan

  • Host-side: basic, pointer math, self-rank, validation errors, non-contiguous, multi-dtype
  • Device-side: read via as_remote + gl.load, write via as_remote + gl.store
  • Run on real GPU hardware with torchrun --nproc_per_node=4

🤖 Generated with Claude Code

mawad-amd and others added 3 commits April 1, 2026 16:18
Host-side: ctx.as_remote(tensor, rank) returns a zero-copy view pointing
to the target rank's symmetric heap copy (same shape/dtype/strides).
Device-side: ctx.as_remote(ptr, rank) translates a local pointer to the
target rank's address space for direct use with gl.load/gl.store.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added in-progress We are working on it iris Iris project issue labels Apr 1, 2026
@mawad-amd mawad-amd changed the title Add \as_remote(rank)\ pointer translation API Add as_remote(rank) pointer translation API Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in-progress We are working on it iris Iris project issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant