Skip to content

Conversation

@mfreeborn
Copy link

See #46

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.26%. Comparing base (2e67b5c) to head (4f87d2e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #47      +/-   ##
==========================================
+ Coverage   99.16%   99.26%   +0.09%     
==========================================
  Files          12       12              
  Lines        2167     2165       -2     
==========================================
  Hits         2149     2149              
+ Misses         18       16       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Shnatsel
Copy link
Collaborator

On my Zen 4 CPU this is a consistent regression in the default configuration and makes little difference with -C target-cpu=native:

cargo bench --bench=bit_reversal
     Running benches/bit_reversal.rs (target/release/deps/bit_reversal-7310e7572d98d06c)
cobra_apply/cobra/15    time:   [53.719 µs 53.849 µs 54.000 µs]
                        change: [+20.898% +21.193% +21.543%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
cobra_apply/cobra/16    time:   [105.59 µs 105.76 µs 105.92 µs]
                        change: [+24.041% +24.295% +24.599%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
cobra_apply/cobra/17    time:   [210.84 µs 211.11 µs 211.40 µs]
                        change: [+24.459% +24.613% +24.777%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
cobra_apply/cobra/18    time:   [417.87 µs 418.34 µs 418.83 µs]
                        change: [+24.279% +24.441% +24.611%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
cobra_apply/cobra/19    time:   [839.32 µs 839.56 µs 839.83 µs]
                        change: [+25.252% +25.328% +25.406%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
RUSTFLAGS='-C target-cpu=native' cargo bench --bench=bit_reversal
cobra_apply/cobra/15    time:   [40.607 µs 40.631 µs 40.656 µs]
                        change: [−1.5313% −1.4557% −1.3814%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
cobra_apply/cobra/16    time:   [79.011 µs 79.043 µs 79.082 µs]
                        change: [+0.5886% +0.6283% +0.6723%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe
cobra_apply/cobra/17    time:   [158.08 µs 158.18 µs 158.29 µs]
                        change: [+1.4714% +1.5514% +1.6366%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe
cobra_apply/cobra/18    time:   [315.23 µs 315.38 µs 315.53 µs]
                        change: [+3.6858% +3.7494% +3.8149%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
cobra_apply/cobra/19    time:   [634.41 µs 634.94 µs 635.44 µs]
                        change: [+3.3992% +3.4810% +3.5690%] (p = 0.00 < 0.05)
                        Performance has regressed.

On what hardware did you measure it?

@mfreeborn
Copy link
Author

Interesting!

CPU is AMD Ryzen™ 5 5625U with Radeon™ Graphics × 12.

That said, I didn't set the target-cpu...

@Shnatsel
Copy link
Collaborator

Hmm. Rust version? Mine is rustc 1.91.1 (ed61e7d7e 2025-11-07)

@mfreeborn
Copy link
Author

rust 1.91.0

The +/- %s might be bit messed up because of the order I ran the benches, but the absolute numbers show a stark benefit of the LUT!

With LUT, target-cpu=native Benchmarking cobra_apply/cobra/15: Collecting 100 samples in estimated 5.1372 s (81k i cobra_apply/cobra/15 time: [64.174 µs 64.578 µs 65.008 µs] change: [−21.340% −20.823% −20.346%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild Benchmarking cobra_apply/cobra/16: Collecting 100 samples in estimated 5.3958 s (45k i cobra_apply/cobra/16 time: [118.73 µs 119.12 µs 119.51 µs] change: [−24.808% −24.492% −24.191%] (p = 0.00 < 0.05) Performance has improved. Benchmarking cobra_apply/cobra/17: Collecting 100 samples in estimated 6.0103 s (25k i cobra_apply/cobra/17 time: [237.44 µs 237.76 µs 238.10 µs] change: [−25.357% −24.942% −24.548%] (p = 0.00 < 0.05) Performance has improved. Benchmarking cobra_apply/cobra/18: Collecting 100 samples in estimated 6.9839 s (15k i cobra_apply/cobra/18 time: [465.76 µs 468.96 µs 472.81 µs] change: [−27.233% −26.014% −25.090%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe Benchmarking cobra_apply/cobra/19: Collecting 100 samples in estimated 9.8303 s (10k i cobra_apply/cobra/19 time: [953.42 µs 957.66 µs 962.12 µs] change: [−23.881% −23.161% −22.503%] (p = 0.00 < 0.05) Performance has improved.
Without LUT, target-cpu=native Benchmarking cobra_apply/cobra/15: Collecting 100 samples in estimated 5.2045 s (50k i cobra_apply/cobra/15 time: [103.51 µs 103.85 µs 104.23 µs] change: [+60.409% +61.184% +62.011%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe Benchmarking cobra_apply/cobra/16: Collecting 100 samples in estimated 5.2254 s (25k i cobra_apply/cobra/16 time: [201.28 µs 201.65 µs 202.05 µs] change: [+68.858% +69.429% +70.026%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking cobra_apply/cobra/17: Collecting 100 samples in estimated 6.2127 s (15k i cobra_apply/cobra/17 time: [408.82 µs 409.32 µs 409.87 µs] change: [+71.743% +72.192% +72.625%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe Benchmarking cobra_apply/cobra/18: Collecting 100 samples in estimated 8.3469 s (10k i cobra_apply/cobra/18 time: [822.37 µs 823.99 µs 825.80 µs] change: [+77.138% +77.935% +78.674%] (p = 0.00 < 0.05) Performance has regressed. Benchmarking cobra_apply/cobra/19: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.3s, enable flat sampling, or reduce sample count to 50. Benchmarking cobra_apply/cobra/19: Collecting 100 samples in estimated 8.3439 s (5050 cobra_apply/cobra/19 time: [1.6567 ms 1.6615 ms 1.6663 ms] change: [+71.948% +72.926% +73.809%] (p = 0.00 < 0.05) Performance has regressed.
With LUT, no RUSTFLAGS cobra_apply/cobra/15 time: [76.037 µs 76.244 µs 76.491 µs] change: [−28.284% −27.892% −27.445%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe Benchmarking cobra_apply/cobra/16: Collecting 100 samples in estimated 5.3447 s (35k i cobra_apply/cobra/16 time: [150.24 µs 150.63 µs 151.06 µs] change: [−29.339% −28.939% −28.615%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking cobra_apply/cobra/17: Collecting 100 samples in estimated 6.1029 s (20k i cobra_apply/cobra/17 time: [304.55 µs 306.17 µs 307.95 µs] change: [−29.238% −28.828% −28.363%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild Benchmarking cobra_apply/cobra/18: Collecting 100 samples in estimated 6.0233 s (10k i cobra_apply/cobra/18 time: [593.32 µs 595.29 µs 597.30 µs] change: [−31.614% −31.149% −30.778%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe Benchmarking cobra_apply/cobra/19: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 60. Benchmarking cobra_apply/cobra/19: Collecting 100 samples in estimated 6.0096 s (5050 cobra_apply/cobra/19 time: [1.1736 ms 1.1766 ms 1.1799 ms] change: [−31.786% −31.555% −31.313%] (p = 0.00 < 0.05) Performance has improved.
Without LUT, no RUSTFLAGS Benchmarking cobra_apply/cobra/15: Collecting 100 samples in estimated 5.3685 s (50k i cobra_apply/cobra/15 time: [106.26 µs 106.56 µs 106.90 µs] change: [+2.3883% +2.8431% +3.2761%] (p = 0.00 < 0.05) Performance has regressed. Benchmarking cobra_apply/cobra/16: Collecting 100 samples in estimated 5.3756 s (25k i cobra_apply/cobra/16 time: [211.63 µs 212.11 µs 212.79 µs] change: [+4.9958% +5.4688% +6.1208%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe Benchmarking cobra_apply/cobra/17: Collecting 100 samples in estimated 6.4979 s (15k i cobra_apply/cobra/17 time: [428.96 µs 429.87 µs 430.97 µs] change: [+4.9453% +5.3419% +5.7541%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild Benchmarking cobra_apply/cobra/18: Collecting 100 samples in estimated 8.9149 s (10k i cobra_apply/cobra/18 time: [863.06 µs 868.59 µs 876.94 µs] change: [+4.2325% +4.7191% +5.3150%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe Benchmarking cobra_apply/cobra/19: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.7s, enable flat sampling, or reduce sample count to 50. Benchmarking cobra_apply/cobra/19: Collecting 100 samples in estimated 8.7387 s (5050 cobra_apply/cobra/19 time: [1.7152 ms 1.7169 ms 1.7188 ms] change: [+3.1731% +3.4948% +3.8108%] (p = 0.00 < 0.05) Performance has regressed.

@Shnatsel
Copy link
Collaborator

Tip: to make percentages make sense, you can run

cargo bench --bench=bit_reversal -- --save-baseline=main followed by cargo bench --bench=bit_reversal -- --baseline=main and it will calculate percentages relative to the baseline saved by the first command.

@mfreeborn
Copy link
Author

mfreeborn commented Nov 15, 2025 via email

@Shnatsel
Copy link
Collaborator

I don't think there's a one-size-fits-all solution. If we want to reap these gains, we'll need to copy FFTW's design and measure the performance of various implementations at runtime, then select the fastest one.

@Shnatsel
Copy link
Collaborator

I've looked into COBRA some more and it's highly hardware-dependent: #49

We really do just need to start going down the FFTW route, measure the different variants in the planner and pick the best one for the hardware we're running on.

It would be great to have your LUT-based version as one of the options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants