Enable roofline for mx4 <-> fp32 #599

bremerm31 · 2025-10-29T18:22:18Z

Summary:
Adding gbps metric and setting kernels as memory bound for mx4_to_fp32 and fp32_to_mx4. On H100, we see

<tritonbench run>  --op mx4_to_fp32 --device cuda --metrics gbps,hw_roofline,latency

prints

  (Size, Group Size, ebits, mbits)    hw_roofline    fbgemm_mx4_to_fp32-gbps    fbgemm_mx4_to_fp32-latency
----------------------------------  -------------  -------------------------  ----------------------------
                  (6392, 32, 2, 1)           2000                     8.6048             0.006336 (±7.07%)
                (278528, 32, 2, 1)           2000                   335.928              0.007072 (±8.14%)
              (17825792, 32, 2, 1)           2000                  1874.3                0.081120 (±1.70%)
              (17825809, 32, 2, 1)           2000                  1879.5                0.080896 (±1.90%)

<tritonbench run>  --op fp32_to_mx4 --device cuda --metrics gbps,hw_roofline,latency

prints

  (Size, Group Size, ebits, mbits, rounding_mode, stochastic_casting)    hw_roofline    fbgemm_fp32_to_mx4-gbps    fbgemm_fp32_to_mx4-latency
---------------------------------------------------------------------  -------------  -------------------------  ----------------------------
                     (24048, 32, 2, 1, <RoundingMode.even: 2>, False)           2000                    15.6924             0.006944 (±6.91%)
                   (1048576, 32, 2, 1, <RoundingMode.even: 2>, False)           2000                   412.444              0.011520 (±5.00%)
                  (67108864, 32, 2, 1, <RoundingMode.even: 2>, False)           2000                  1734.07               0.175360 (±0.53%)
                  (67108880, 32, 2, 1, <RoundingMode.even: 2>, False)           2000                  1733.76               0.175392 (±0.64%)

Differential Revision: D85782147

Summary: Adding gbps metric and setting kernels as memory bound for `mx4_to_fp32` and `fp32_to_mx4`. On H100, we see ``` <tritonbench run> --op mx4_to_fp32 --device cuda --metrics gbps,hw_roofline,latency ``` prints ``` (Size, Group Size, ebits, mbits) hw_roofline fbgemm_mx4_to_fp32-gbps fbgemm_mx4_to_fp32-latency ---------------------------------- ------------- ------------------------- ---------------------------- (6392, 32, 2, 1) 2000 8.6048 0.006336 (±7.07%) (278528, 32, 2, 1) 2000 335.928 0.007072 (±8.14%) (17825792, 32, 2, 1) 2000 1874.3 0.081120 (±1.70%) (17825809, 32, 2, 1) 2000 1879.5 0.080896 (±1.90%) ``` ``` <tritonbench run> --op fp32_to_mx4 --device cuda --metrics gbps,hw_roofline,latency ``` prints ``` (Size, Group Size, ebits, mbits, rounding_mode, stochastic_casting) hw_roofline fbgemm_fp32_to_mx4-gbps fbgemm_fp32_to_mx4-latency --------------------------------------------------------------------- ------------- ------------------------- ---------------------------- (24048, 32, 2, 1, <RoundingMode.even: 2>, False) 2000 15.6924 0.006944 (±6.91%) (1048576, 32, 2, 1, <RoundingMode.even: 2>, False) 2000 412.444 0.011520 (±5.00%) (67108864, 32, 2, 1, <RoundingMode.even: 2>, False) 2000 1734.07 0.175360 (±0.53%) (67108880, 32, 2, 1, <RoundingMode.even: 2>, False) 2000 1733.76 0.175392 (±0.64%) ``` Differential Revision: D85782147

meta-codesync · 2025-10-29T18:22:25Z

@bremerm31 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D85782147.

bremerm31 temporarily deployed to docker-s3-upload October 29, 2025 18:22 — with GitHub Actions Inactive

meta-cla bot added the cla signed label Oct 29, 2025

meta-codesync bot added fb-exported meta-exported labels Oct 29, 2025

xuzhao9 self-requested a review October 29, 2025 20:09

xuzhao9 approved these changes Oct 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable roofline for mx4 <-> fp32 #599

Enable roofline for mx4 <-> fp32 #599

Uh oh!

bremerm31 commented Oct 29, 2025

Uh oh!

meta-codesync bot commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable roofline for mx4 <-> fp32 #599

Are you sure you want to change the base?

Enable roofline for mx4 <-> fp32 #599

Uh oh!

Conversation

bremerm31 commented Oct 29, 2025

Uh oh!

meta-codesync bot commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants