Use torchao for MXFP8 quantization #35

jwfromm · 2025-12-08T19:08:59Z

Summary: Torchao offers optimized triton mxfp8 kernels that are quite a bit more efficient than the naive python implementations we currently use in our benchmarks. This diff adds torchao import and properly uses those faster quantization kernels for gemm and grouped gemm when available.

Differential Revision: D88660032

Summary: Torchao offers optimized triton mxfp8 kernels that are quite a bit more efficient than the naive python implementations we currently use in our benchmarks. This diff adds torchao import and properly uses those faster quantization kernels for gemm and grouped gemm when available. Differential Revision: D88660032

meta-codesync · 2025-12-08T19:09:07Z

@jwfromm has exported this pull request. If you are a Meta employee, you can view the originating Diff in D88660032.

meta-cla bot added the cla signed label Dec 8, 2025

meta-codesync bot added fb-exported meta-exported labels Dec 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use torchao for MXFP8 quantization #35

Use torchao for MXFP8 quantization #35

Uh oh!

jwfromm commented Dec 8, 2025

Uh oh!

meta-codesync bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Use torchao for MXFP8 quantization #35

Are you sure you want to change the base?

Use torchao for MXFP8 quantization #35

Uh oh!

Conversation

jwfromm commented Dec 8, 2025

Uh oh!

meta-codesync bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant