Skip to content

Conversation

@jwfromm
Copy link
Contributor

@jwfromm jwfromm commented Dec 8, 2025

Summary: Torchao offers optimized triton mxfp8 kernels that are quite a bit more efficient than the naive python implementations we currently use in our benchmarks. This diff adds torchao import and properly uses those faster quantization kernels for gemm and grouped gemm when available.

Differential Revision: D88660032

Summary: Torchao offers optimized triton mxfp8 kernels that are quite a bit more efficient than the naive python implementations we currently use in our benchmarks. This diff adds torchao import and properly uses those faster quantization kernels for gemm and grouped gemm when available.

Differential Revision: D88660032
@meta-cla meta-cla bot added the cla signed label Dec 8, 2025
@meta-codesync
Copy link

meta-codesync bot commented Dec 8, 2025

@jwfromm has exported this pull request. If you are a Meta employee, you can view the originating Diff in D88660032.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant