FP4/MXFP4 inference acceleration on GPU: design questions for GSoC 2026 Project 14 #3972

CodersAcademy006 · 2026-03-07T04:51:41Z

CodersAcademy006
Mar 7, 2026

Hi NNCF team,

Working on a proposal for the Triton kernel acceleration project (GSoC 2026)
and want to validate some design assumptions before writing the full proposal.

What I've found in the current codebase:
The docs state that NF4, MXFP4, MXFP8_E4M3 are "experimental on GPU and NPU"
and models compressed to these formats "should not be faster than 8-bit
integer." This accurately describes the current state: there's no GPU-optimized
dequantization kernel for these formats. The dequant path falls back to
unoptimized PyTorch ops at inference time.

Three design questions before I write the proposal:

Format priority: Should the Triton kernel target MXFP4 (E2M1 + E8M0
group scale, the OpenVINO IR native format) or FP4 (E2M1 + FP16 group
scale, the PyTorch-native variant) first? They have different dequant
arithmetic despite the same weight format.
torch.compile registration: For torch.compile compatibility, should
the Triton kernel be registered via torch.library.custom_op with a
fake/abstract implementation, or is there a preferred NNCF pattern
for registering custom ops that I should follow?
Benchmark target: For measuring speedup, is the goal to compare
against uncompressed FP16 inference, or against the current
CompressWeightsMode.INT4_SYM path on the same hardware?

I have a minimal FP4 dequant kernel prototype I can share as a starting
point once these design questions are resolved.

CodersAcademy006 · 2026-03-07T04:58:40Z

CodersAcademy006
Mar 7, 2026
Author

@Saad-Mallebhari, please provide your opinions as well on this. Also please correct me if i am wrong. Thank You.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP4/MXFP4 inference acceleration on GPU: design questions for GSoC 2026 Project 14 #3972

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

FP4/MXFP4 inference acceleration on GPU: design questions for GSoC 2026 Project 14 #3972

Uh oh!

CodersAcademy006 Mar 7, 2026

Replies: 1 comment

Uh oh!

CodersAcademy006 Mar 7, 2026 Author

CodersAcademy006
Mar 7, 2026

CodersAcademy006
Mar 7, 2026
Author