Skip to content

Conversation

@Flakebi
Copy link
Contributor

@Flakebi Flakebi commented Jan 2, 2026

There is an ongoing discussion in #150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like core::gpu.

Add a rustc intrinsic amdgpu_dispatch_ptr to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.

The Rust intrinsic lowers to the llvm.amdgcn.dispatch.ptr LLVM intrinsic, which returns a ptr addrspace(4), plus an addrspacecast to addrspace(0), so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore 'static.
The return type of the intrinsic (&'static ()) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?

Short version:

#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();

Tracking issue: #135024

r? RalfJung as you are already aware of the background (feel free to re-assign)

@rustbot
Copy link
Collaborator

rustbot commented Jan 2, 2026

Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter
gets adapted for the changes, if necessary.

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 2, 2026
@RalfJung
Copy link
Member

RalfJung commented Jan 2, 2026

I can help with design review but not for the implementation, sorry.
@rustbot reroll

Regarding the design, if the return type is "erased" I would suggest using a raw pointer instead of a reference.

@workingjubilee
Copy link
Member

yoink.

If anyone wants to chip in on the review, please feel free, I just want to make sure I have a gander before it ships so I can keep vaguely abreast of what's happening in this space.

#[rustc_intrinsic]
#[cfg(target_arch = "amdgpu")]
#[must_use = "returns a reference that does nothing unless used"]
pub fn amdgpu_dispatch_ptr() -> &'static ();
Copy link
Member

@RalfJung RalfJung Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have a new "amdgpu" file or so for this, to keep it separate from the typically more portable intrinsics in the rest of this file?

Or a new "gpu" file that offload also goes into? I don't know what a sensible grouping here would look like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpu.rs seems like a good place to start, to me, that way other gpu targets are encouraged to reuse code from amdgpu by generalizing it instead of repeating it (as we have discussed before, GPU targets are like siblings: they make much of their tiny differences, while being mostly similar).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glancing at NVPTX, it seems they achieve the same things as AMDGPU here by having special registers that are read, whereas AMDGPU uses the struct pointer, so at least for this case they will differ.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( given their code is JITted by the device driver anyways, for all I know they both actually use the same pattern in the actual machine code they lower to. )

@Flakebi Flakebi force-pushed the dispatch-ptr-intrinsic branch from 1d98b13 to 13d7a3c Compare January 2, 2026 19:09
@Flakebi
Copy link
Contributor Author

Flakebi commented Jan 2, 2026

Thanks for the quick reviews!
I changed the return type from a reference to *const () and moved the intrinsic to a new gpu.rs module.

cc @ZuseZ4 FYI for the core::intrinsics::gpu module.

@Flakebi Flakebi mentioned this pull request Jan 2, 2026
26 tasks
Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be conformant to the idea that @nikic recommended, of keeping the address space out of the public API, and the code is otherwise fine, so it should be good.

I recommend commenting on the load-bearing implication, though, before we send this in.

View changes since this review

@workingjubilee workingjubilee added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 6, 2026
@Flakebi Flakebi force-pushed the dispatch-ptr-intrinsic branch from 13d7a3c to fe4a9b4 Compare January 7, 2026 00:24
@Flakebi
Copy link
Contributor Author

Flakebi commented Jan 7, 2026

I added the suggested comment.

(For context, as far as I know, pointercast – in the age of opaque pointers – does only one of these two things:

  1. ptrtoint or
  2. addrspacecast)

FWIW, I understood nikic’s comment on the addrspace PR as that we probably don’t need to expose address spaces publicly for address spaces convertible to the generic address space.
We could still use address spaces internally to implement a public API without address spaces ;)

@workingjubilee
Copy link
Member

Indeed, I'm just noting it makes it very hidden behind the API in this case by not even letting the addrspace leave the compiler! If we want to maybe put a type in core's internals we can still do so. Later. For now,

@bors r+ rollup

@bors
Copy link
Collaborator

bors commented Jan 7, 2026

📌 Commit fe4a9b4 has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 7, 2026
Kobzol added a commit to Kobzol/rust that referenced this pull request Jan 7, 2026
…ubilee

Add amdgpu_dispatch_ptr intrinsic

There is an ongoing discussion in rust-lang#150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`.

Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`.
The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```

Tracking issue: rust-lang#135024

r? RalfJung as you are already aware of the background (feel free to re-assign)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jan 8, 2026
…ubilee

Add amdgpu_dispatch_ptr intrinsic

There is an ongoing discussion in rust-lang#150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`.

Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`.
The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```

Tracking issue: rust-lang#135024

r? RalfJung as you are already aware of the background (feel free to re-assign)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jan 8, 2026
…ubilee

Add amdgpu_dispatch_ptr intrinsic

There is an ongoing discussion in rust-lang#150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`.

Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`.
The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```

Tracking issue: rust-lang#135024

r? RalfJung as you are already aware of the background (feel free to re-assign)
rust-bors bot added a commit that referenced this pull request Jan 8, 2026
Rollup of 11 pull requests

Successful merges:

 - #149976 (Add waker_fn and local_waker_fn to std::task)
 - #150074 (Update provider API docs)
 - #150094 (`c_variadic`: provide our own `va_arg` implementation for more targets)
 - #150164 (rustc: Fix `-Zexport-executable-symbols` on wasm)
 - #150569 (Ensure that static initializers are acyclic for NVPTX)
 - #150607 (Add amdgpu_dispatch_ptr intrinsic)
 - #150694 (./x check miri: enable check_only feature)
 - #150717 (Thread `--jobs` from `bootstrap` -> `compiletest` -> `run-make-support`)
 - #150736 (Add AtomicPtr::null)
 - #150787 (Add myself as co-maintainer for s390x-unknown-linux-musl)
 - #150789 (Fix copy-n-paste error in `vtable_for` docs)

r? @ghost
@matthiaskrgr
Copy link
Member

@bors r-
#150801 (comment)

@rust-bors rust-bors bot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 8, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 8, 2026

Commit fe4a9b4 has been unapproved.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we can enforce the optimization level for the test.

View changes since this review

Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel
dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the
launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM
intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to
`addrspace(0)`, so it can be returned as a Rust reference.

The returned pointer/reference is valid for the whole program lifetime,
and is therefore `'static`.

The return type of the intrinsic (`*const ()`) does not mention the
struct so that rustc does not need to know the exact struct type.
An alternative would be to define the struct as lang item or add a
generic argument to the function.

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```
@Flakebi Flakebi force-pushed the dispatch-ptr-intrinsic branch from fe4a9b4 to 91d4e40 Compare January 9, 2026 09:42
@Flakebi
Copy link
Contributor Author

Flakebi commented Jan 9, 2026

I fixed it to pass with -Copt-level=0 (I also needed to remove the ret check as opt-level=0 first stores and loads the result to an alloca before returning it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants