Optimising `cobra_apply` #47

mfreeborn · 2025-11-15T15:30:53Z

codecov-commenter · 2025-11-15T15:36:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.26%. Comparing base (2e67b5c) to head (4f87d2e).

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #47      +/-   ##
==========================================
+ Coverage   99.16%   99.26%   +0.09%     
==========================================
  Files          12       12              
  Lines        2167     2165       -2     
==========================================
  Hits         2149     2149              
+ Misses         18       16       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Shnatsel · 2025-11-15T15:43:56Z

On my Zen 4 CPU this is a consistent regression in the default configuration and makes little difference with -C target-cpu=native:

cargo bench --bench=bit_reversal

     Running benches/bit_reversal.rs (target/release/deps/bit_reversal-7310e7572d98d06c)
cobra_apply/cobra/15    time:   [53.719 µs 53.849 µs 54.000 µs]
                        change: [+20.898% +21.193% +21.543%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
cobra_apply/cobra/16    time:   [105.59 µs 105.76 µs 105.92 µs]
                        change: [+24.041% +24.295% +24.599%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
cobra_apply/cobra/17    time:   [210.84 µs 211.11 µs 211.40 µs]
                        change: [+24.459% +24.613% +24.777%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
cobra_apply/cobra/18    time:   [417.87 µs 418.34 µs 418.83 µs]
                        change: [+24.279% +24.441% +24.611%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
cobra_apply/cobra/19    time:   [839.32 µs 839.56 µs 839.83 µs]
                        change: [+25.252% +25.328% +25.406%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

RUSTFLAGS='-C target-cpu=native' cargo bench --bench=bit_reversal

cobra_apply/cobra/15    time:   [40.607 µs 40.631 µs 40.656 µs]
                        change: [−1.5313% −1.4557% −1.3814%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
cobra_apply/cobra/16    time:   [79.011 µs 79.043 µs 79.082 µs]
                        change: [+0.5886% +0.6283% +0.6723%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe
cobra_apply/cobra/17    time:   [158.08 µs 158.18 µs 158.29 µs]
                        change: [+1.4714% +1.5514% +1.6366%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe
cobra_apply/cobra/18    time:   [315.23 µs 315.38 µs 315.53 µs]
                        change: [+3.6858% +3.7494% +3.8149%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
cobra_apply/cobra/19    time:   [634.41 µs 634.94 µs 635.44 µs]
                        change: [+3.3992% +3.4810% +3.5690%] (p = 0.00 < 0.05)
                        Performance has regressed.

On what hardware did you measure it?

mfreeborn · 2025-11-15T15:47:45Z

Interesting!

CPU is AMD Ryzen™ 5 5625U with Radeon™ Graphics × 12.

That said, I didn't set the target-cpu...

Shnatsel · 2025-11-15T15:49:48Z

Hmm. Rust version? Mine is rustc 1.91.1 (ed61e7d7e 2025-11-07)

mfreeborn · 2025-11-15T16:06:43Z

rust 1.91.0

The +/- %s might be bit messed up because of the order I ran the benches, but the absolute numbers show a stark benefit of the LUT!

With LUT, target-cpu=native


Benchmarking cobra_apply/cobra/15: Collecting 100 samples in estimated 5.1372 s (81k i
cobra_apply/cobra/15    time:   [64.174 µs 64.578 µs 65.008 µs]
                        change: [−21.340% −20.823% −20.346%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking cobra_apply/cobra/16: Collecting 100 samples in estimated 5.3958 s (45k i
cobra_apply/cobra/16    time:   [118.73 µs 119.12 µs 119.51 µs]
                        change: [−24.808% −24.492% −24.191%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking cobra_apply/cobra/17: Collecting 100 samples in estimated 6.0103 s (25k i
cobra_apply/cobra/17    time:   [237.44 µs 237.76 µs 238.10 µs]
                        change: [−25.357% −24.942% −24.548%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking cobra_apply/cobra/18: Collecting 100 samples in estimated 6.9839 s (15k i
cobra_apply/cobra/18    time:   [465.76 µs 468.96 µs 472.81 µs]
                        change: [−27.233% −26.014% −25.090%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Benchmarking cobra_apply/cobra/19: Collecting 100 samples in estimated 9.8303 s (10k i
cobra_apply/cobra/19    time:   [953.42 µs 957.66 µs 962.12 µs]
                        change: [−23.881% −23.161% −22.503%] (p = 0.00 < 0.05)
                        Performance has improved.

Without LUT, target-cpu=native


Benchmarking cobra_apply/cobra/15: Collecting 100 samples in estimated 5.2045 s (50k i
cobra_apply/cobra/15    time:   [103.51 µs 103.85 µs 104.23 µs]
                        change: [+60.409% +61.184% +62.011%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Benchmarking cobra_apply/cobra/16: Collecting 100 samples in estimated 5.2254 s (25k i
cobra_apply/cobra/16    time:   [201.28 µs 201.65 µs 202.05 µs]
                        change: [+68.858% +69.429% +70.026%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking cobra_apply/cobra/17: Collecting 100 samples in estimated 6.2127 s (15k i
cobra_apply/cobra/17    time:   [408.82 µs 409.32 µs 409.87 µs]
                        change: [+71.743% +72.192% +72.625%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
Benchmarking cobra_apply/cobra/18: Collecting 100 samples in estimated 8.3469 s (10k i
cobra_apply/cobra/18    time:   [822.37 µs 823.99 µs 825.80 µs]
                        change: [+77.138% +77.935% +78.674%] (p = 0.00 < 0.05)
                        Performance has regressed.
Benchmarking cobra_apply/cobra/19: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.3s, enable flat sampling, or reduce sample count to 50.
Benchmarking cobra_apply/cobra/19: Collecting 100 samples in estimated 8.3439 s (5050 
cobra_apply/cobra/19    time:   [1.6567 ms 1.6615 ms 1.6663 ms]
                        change: [+71.948% +72.926% +73.809%] (p = 0.00 < 0.05)
                        Performance has regressed.

With LUT, no RUSTFLAGS


cobra_apply/cobra/15    time:   [76.037 µs 76.244 µs 76.491 µs]
                        change: [−28.284% −27.892% −27.445%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
Benchmarking cobra_apply/cobra/16: Collecting 100 samples in estimated 5.3447 s (35k i
cobra_apply/cobra/16    time:   [150.24 µs 150.63 µs 151.06 µs]
                        change: [−29.339% −28.939% −28.615%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking cobra_apply/cobra/17: Collecting 100 samples in estimated 6.1029 s (20k i
cobra_apply/cobra/17    time:   [304.55 µs 306.17 µs 307.95 µs]
                        change: [−29.238% −28.828% −28.363%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking cobra_apply/cobra/18: Collecting 100 samples in estimated 6.0233 s (10k i
cobra_apply/cobra/18    time:   [593.32 µs 595.29 µs 597.30 µs]
                        change: [−31.614% −31.149% −30.778%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking cobra_apply/cobra/19: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 60.
Benchmarking cobra_apply/cobra/19: Collecting 100 samples in estimated 6.0096 s (5050 
cobra_apply/cobra/19    time:   [1.1736 ms 1.1766 ms 1.1799 ms]
                        change: [−31.786% −31.555% −31.313%] (p = 0.00 < 0.05)
                        Performance has improved.

Without LUT, no RUSTFLAGS


Benchmarking cobra_apply/cobra/15: Collecting 100 samples in estimated 5.3685 s (50k i
cobra_apply/cobra/15    time:   [106.26 µs 106.56 µs 106.90 µs]
                        change: [+2.3883% +2.8431% +3.2761%] (p = 0.00 < 0.05)
                        Performance has regressed.
Benchmarking cobra_apply/cobra/16: Collecting 100 samples in estimated 5.3756 s (25k i
cobra_apply/cobra/16    time:   [211.63 µs 212.11 µs 212.79 µs]
                        change: [+4.9958% +5.4688% +6.1208%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Benchmarking cobra_apply/cobra/17: Collecting 100 samples in estimated 6.4979 s (15k i
cobra_apply/cobra/17    time:   [428.96 µs 429.87 µs 430.97 µs]
                        change: [+4.9453% +5.3419% +5.7541%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking cobra_apply/cobra/18: Collecting 100 samples in estimated 8.9149 s (10k i
cobra_apply/cobra/18    time:   [863.06 µs 868.59 µs 876.94 µs]
                        change: [+4.2325% +4.7191% +5.3150%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking cobra_apply/cobra/19: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.7s, enable flat sampling, or reduce sample count to 50.
Benchmarking cobra_apply/cobra/19: Collecting 100 samples in estimated 8.7387 s (5050 
cobra_apply/cobra/19    time:   [1.7152 ms 1.7169 ms 1.7188 ms]
                        change: [+3.1731% +3.4948% +3.8108%] (p = 0.00 < 0.05)
                        Performance has regressed.

Shnatsel · 2025-11-15T16:09:28Z

Tip: to make percentages make sense, you can run

cargo bench --bench=bit_reversal -- --save-baseline=main followed by cargo bench --bench=bit_reversal -- --baseline=main and it will calculate percentages relative to the baseline saved by the first command.

mfreeborn · 2025-11-15T16:14:16Z

Ah that's useful. Criterion is one of these tools which I severely under use. If I ever read the docs, I could probably figure out how to group the with- and without-LUT variants into a single benchmark for much easier direct comparison.

…

On Sat, 15 Nov 2025, 16:09 Shnatsel, ***@***.***> wrote: *Shnatsel* left a comment (QuState/PhastFT#47) <#47 (comment)> Tip: to make percentages make sense, you can run cargo bench --bench=bit_reversal -- --save-baseline=main followed by cargo bench --bench=bit_reversal -- --baseline=main and it will calculate percentages relative to the baseline saved by the first command. — Reply to this email directly, view it on GitHub <#47 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHSVKWAJFRYCGOFFIYM4GP3345F45AVCNFSM6AAAAACMGPWXK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKMZWGYZTAMBUGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Shnatsel · 2025-11-23T11:51:11Z

I don't think there's a one-size-fits-all solution. If we want to reap these gains, we'll need to copy FFTW's design and measure the performance of various implementations at runtime, then select the fastest one.

Shnatsel · 2025-11-23T14:33:41Z

I've looked into COBRA some more and it's highly hardware-dependent: #49

We really do just need to start going down the FFTW route, measure the different variants in the planner and pick the best one for the hardware we're running on.

It would be great to have your LUT-based version as one of the options.

mfreeborn added 2 commits November 15, 2025 15:03

add benchmark for cobra

e211a94

use lookup table for cobra_apply

4f87d2e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimising `cobra_apply` #47

Optimising `cobra_apply` #47

Uh oh!

mfreeborn commented Nov 15, 2025

Uh oh!

codecov-commenter commented Nov 15, 2025

Uh oh!

Shnatsel commented Nov 15, 2025

Uh oh!

mfreeborn commented Nov 15, 2025

Uh oh!

Shnatsel commented Nov 15, 2025

Uh oh!

mfreeborn commented Nov 15, 2025

Uh oh!

Shnatsel commented Nov 15, 2025

Uh oh!

mfreeborn commented Nov 15, 2025 via email

Uh oh!

Shnatsel commented Nov 23, 2025

Uh oh!

Shnatsel commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimising cobra_apply #47

Are you sure you want to change the base?

Optimising cobra_apply #47

Uh oh!

Conversation

mfreeborn commented Nov 15, 2025

Uh oh!

codecov-commenter commented Nov 15, 2025

Codecov Report

Uh oh!

Shnatsel commented Nov 15, 2025

Uh oh!

mfreeborn commented Nov 15, 2025

Uh oh!

Shnatsel commented Nov 15, 2025

Uh oh!

mfreeborn commented Nov 15, 2025

Uh oh!

Shnatsel commented Nov 15, 2025

Uh oh!

mfreeborn commented Nov 15, 2025 via email

Uh oh!

Shnatsel commented Nov 23, 2025

Uh oh!

Shnatsel commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimising `cobra_apply` #47

Optimising `cobra_apply` #47