Skip to content

Conversation

@marcdumais-work
Copy link
Contributor

@marcdumais-work marcdumais-work commented May 21, 2025

What it does

The recently added assertions in the benchmark test, that confirm that the new async handler is much more performant vs the old sync handler, were failing the test once in a while.

Originally the assertions were against each test run - those runs with fewer iterations were very sensitive to interference by e.g. GC.

e.g. imagine that a 10-20 ms GC occurs while doing one of the runs below, while performing logging using "AsyncNew" - it could easily make "AsyncNew" look much less performant vs "SyncOld", once in a while making the AsyncNewVsSyncOld ratio look too low and failing the test:

Runs(#)   SyncOld(ms)   AsyncNew(ms)  AsyncNewVsSyncOld(rel perf)
   2000         75.48          8.56                        8.81x
   4600        243.99          4.24                       57.59x
  10580        239.99          9.82                       24.43x
  24333        404.66         46.39                        8.72x

Here's a case where it happened locally during the "10580" run for AsyncNew, just about doubling its apparent duration:
image

After this commit, the performance is asserted against the average performance of AsyncNew vs SyncOld, over the totals of all runs (separately for the Regular vs Lean benchmarks). The total/average info is now printed on stdout in human-readable form, under the existing table:

"Regular" Benchmark Results (Human-readable):
Runs(#)   SyncOld(ms)   SyncNew(ms)  AsyncOld(ms)  AsyncNew(ms)  AsyncNewVsSyncOld(rel perf)
   2000         73.92        103.07         16.13          6.80                       10.87x
   4600        111.18        153.01         24.33          5.25                       21.18x
  10580        177.14        232.28         64.40         18.39                        9.63x
  24333        361.16        378.44        179.38         42.53                        8.49x
  55965        880.51        817.50        435.67         47.72                       18.45x
 128719       1869.16       1867.09        822.97        178.17                       10.49x

"Regular" - Total/Average
Runs(#)   SyncOld(ms)  AsyncNew(ms)  AsyncNewVsSyncOld(rel perf)
 226197       3473.06        298.86                       11.62x

How to test

Verify that CI passes.

Follow-ups

N/A

Review checklist

  • As an author, I have thoroughly tested my changes and carefully followed the instructions in this template

@marcdumais-work marcdumais-work force-pushed the benchmark-perf-use-mean branch 2 times, most recently from b3c015e to 77d445d Compare May 21, 2025 18:41
…hold

The recently added assertions in the benchmark test, that confirm that
the new async handler is much more performant vs the old sync handler,
were failing the test once in a while.

Originally the assertions were against each test run - those runs with
fewer iterations were very sensitive to interference by e.g. GC.

e.g. imagine that a 10-20 ms GC occurs while doing one of the runs
below, while performing logging using "AsyncNew" - it could easily
make "AsyncNew" look much less performant vs "SyncOld", once in a
while making the AsyncNewVsSyncOld ratio look too low and failing
the test :

Runs(#)   SyncOld(ms)   AsyncNew(ms)  AsyncNewVsSyncOld(rel perf)
   2000         75.48          8.56                        8.81x
   4600        243.99          4.24                       57.59x
  10580        239.99          9.82                       24.43x
  24333        404.66         46.39                        8.72x

After this commit, the performance is asserted against the average
performance of AsyncNew vs SyncOld, over the totals of all runs
(separately for the Regular vs Lean benchmarks). The totals are
also printed-out on stdout in human-readable form.

Signed-off-by: Marc Dumais <marc.dumais@ericsson.com>
@marcdumais-work marcdumais-work force-pushed the benchmark-perf-use-mean branch from 77d445d to 86b2a9c Compare May 21, 2025 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant