[benchmark test] use average when asserting against perf threshold #69

marcdumais-work · 2025-05-21T14:20:06Z

What it does

The recently added assertions in the benchmark test, that confirm that the new async handler is much more performant vs the old sync handler, were failing the test once in a while.

Originally the assertions were against each test run - those runs with fewer iterations were very sensitive to interference by e.g. GC.

e.g. imagine that a 10-20 ms GC occurs while doing one of the runs below, while performing logging using "AsyncNew" - it could easily make "AsyncNew" look much less performant vs "SyncOld", once in a while making the AsyncNewVsSyncOld ratio look too low and failing the test:

Runs(#)   SyncOld(ms)   AsyncNew(ms)  AsyncNewVsSyncOld(rel perf)
   2000         75.48          8.56                        8.81x
   4600        243.99          4.24                       57.59x
  10580        239.99          9.82                       24.43x
  24333        404.66         46.39                        8.72x

Here's a case where it happened locally during the "10580" run for AsyncNew, just about doubling its apparent duration:

After this commit, the performance is asserted against the average performance of AsyncNew vs SyncOld, over the totals of all runs (separately for the Regular vs Lean benchmarks). The total/average info is now printed on stdout in human-readable form, under the existing table:

"Regular" Benchmark Results (Human-readable):
Runs(#)   SyncOld(ms)   SyncNew(ms)  AsyncOld(ms)  AsyncNew(ms)  AsyncNewVsSyncOld(rel perf)
   2000         73.92        103.07         16.13          6.80                       10.87x
   4600        111.18        153.01         24.33          5.25                       21.18x
  10580        177.14        232.28         64.40         18.39                        9.63x
  24333        361.16        378.44        179.38         42.53                        8.49x
  55965        880.51        817.50        435.67         47.72                       18.45x
 128719       1869.16       1867.09        822.97        178.17                       10.49x

"Regular" - Total/Average
Runs(#)   SyncOld(ms)  AsyncNew(ms)  AsyncNewVsSyncOld(rel perf)
 226197       3473.06        298.86                       11.62x

How to test

Verify that CI passes.

Follow-ups

N/A

Review checklist

As an author, I have thoroughly tested my changes and carefully followed the instructions in this template

…hold The recently added assertions in the benchmark test, that confirm that the new async handler is much more performant vs the old sync handler, were failing the test once in a while. Originally the assertions were against each test run - those runs with fewer iterations were very sensitive to interference by e.g. GC. e.g. imagine that a 10-20 ms GC occurs while doing one of the runs below, while performing logging using "AsyncNew" - it could easily make "AsyncNew" look much less performant vs "SyncOld", once in a while making the AsyncNewVsSyncOld ratio look too low and failing the test : Runs(#) SyncOld(ms) AsyncNew(ms) AsyncNewVsSyncOld(rel perf) 2000 75.48 8.56 8.81x 4600 243.99 4.24 57.59x 10580 239.99 9.82 24.43x 24333 404.66 46.39 8.72x After this commit, the performance is asserted against the average performance of AsyncNew vs SyncOld, over the totals of all runs (separately for the Regular vs Lean benchmarks). The totals are also printed-out on stdout in human-readable form. Signed-off-by: Marc Dumais <marc.dumais@ericsson.com>

marcdumais-work force-pushed the benchmark-perf-use-mean branch 2 times, most recently from b3c015e to 77d445d Compare May 21, 2025 18:41

marcdumais-work force-pushed the benchmark-perf-use-mean branch from 77d445d to 86b2a9c Compare May 21, 2025 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[benchmark test] use average when asserting against perf threshold #69

[benchmark test] use average when asserting against perf threshold #69

Uh oh!

marcdumais-work commented May 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[benchmark test] use average when asserting against perf threshold #69

Are you sure you want to change the base?

[benchmark test] use average when asserting against perf threshold #69

Uh oh!

Conversation

marcdumais-work commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What it does

How to test

Follow-ups

Review checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcdumais-work commented May 21, 2025 •

edited

Loading