New circular buffer using faa instead of cas to optimize performance #3644

ZENOTME · 2025-09-17T13:12:33Z

Changes

I noticed that original circular buffer using by batch span processor use lock free MPSC queu based on CAS. When experiencing intense multithreading contention, Compare-And-Swap (CAS) exhibits poorer scalability compared to Fetch-And-Add (FAA). In reference of https://github.com/dbittman/waitfree-mpsc-queue, this pr attempt to implement a wait-free MPSC (Multiple Producer, Single Consumer) queue using FAA. Base on original benchmark, it indicate that this approach demonstrates better performance scalability.

Run on (48 X 2593.99 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x24)
  L1 Instruction 32 KiB (x24)
  L2 Unified 256 KiB (x24)
  L3 Unified 30720 KiB (x2)
Load Average: 7.85, 5.70, 4.48
----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
BM_BaselineBuffer/1     10178537 ns        51528 ns         1000
BM_BaselineBuffer/2      7408646 ns        69828 ns         1000
BM_BaselineBuffer/4      7684772 ns       127549 ns         1000
BM_BaselineBuffer/8      7222459 ns       278660 ns         1000
BM_BaselineBuffer/16     6716972 ns       603712 ns         1215
BM_LockFreeBuffer/1      3915343 ns        53125 ns         1000
BM_LockFreeBuffer/2      4798406 ns        70581 ns         1000
BM_LockFreeBuffer/4      4562709 ns       128493 ns         1000
BM_LockFreeBuffer/8      4935221 ns       289996 ns         1000
BM_LockFreeBuffer/16     5187913 ns       618856 ns         1081
BM_OptimizedBuffer/1     4256507 ns        49970 ns         1000
BM_OptimizedBuffer/2     3398719 ns        67712 ns         1000
BM_OptimizedBuffer/4     3204749 ns       127378 ns         1000
BM_OptimizedBuffer/8     3230722 ns       296507 ns         1000
BM_OptimizedBuffer/16    3859005 ns       769220 ns         1000

I would like to refine this PR if it's acceptable.

linux-foundation-easycla · 2025-09-17T13:12:39Z

✅login: ZENOTME / name: zenotme / (1836bbf)

The committers listed above are authorized under a signed CLA.

codecov · 2025-09-17T13:39:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.08%. Comparing base (aeb5a01) to head (1836bbf).
⚠️ Report is 50 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3644      +/-   ##
==========================================
+ Coverage   90.03%   90.08%   +0.06%     
==========================================
  Files         220      220              
  Lines        7065     7111      +46     
==========================================
+ Hits         6360     6405      +45     
- Misses        705      706       +1

Files with missing lines	Coverage Δ
...include/opentelemetry/sdk/common/circular_buffer.h	`98.15% <ø> (-1.85%)`	⬇️
...e/opentelemetry/sdk/common/circular_buffer_range.h	`100.00% <100.00%> (ø)`

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

owent · 2025-09-19T07:13:02Z

sdk/test/common/optmize_circular_buffer.h

+
+  CircularBufferRange<AtomicUniquePtr<T>> PeekImpl() noexcept
+  {
+    uint64_t current_count = count_.load(std::memory_order_relaxed);


The only sync operation is processed here. Is it possible return the same CircularBufferRange for more than one thread, and cause data race?

I'm sorry I can't get the point here, could you elaborate a more specific scene.

When two threads call this function and finish the callings before any change, they may returns the same values. And callback(range); in Consume will be called with the same range.

Yes, this circular buffer implementation is mpsc and assuming only one consumer thread calling this method. That's how it used in batch span processor.

Looks like PeekImpl() could be made const, so the const_cast could be removed too

Reneg973 · 2025-10-26T03:32:48Z

On Mac M1 I get:

Run on (10 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 9.93, 6.67, 5.87
----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
BM_BaselineBuffer/1      2789978 ns        32254 ns         1000
BM_BaselineBuffer/2      2949209 ns        46430 ns         1000
BM_BaselineBuffer/4      1956026 ns        72750 ns         9651
BM_BaselineBuffer/8      1690755 ns       117763 ns         5736
BM_BaselineBuffer/16     1491443 ns       212598 ns         3275
BM_LockFreeBuffer/1      1223514 ns        32825 ns        10000
BM_LockFreeBuffer/2      1369947 ns        45685 ns        10000
BM_LockFreeBuffer/4      2768243 ns        76035 ns         1000
BM_LockFreeBuffer/8      3852933 ns       135797 ns         1000
BM_LockFreeBuffer/16     4341987 ns       259862 ns         1000
BM_OptimizedBuffer/1     1471849 ns        32878 ns        10000
BM_OptimizedBuffer/2     1138345 ns        45200 ns        10000
BM_OptimizedBuffer/4     1536488 ns        73745 ns         9610
BM_OptimizedBuffer/8     1577325 ns       126947 ns         5565
BM_OptimizedBuffer/16    1833288 ns       237527 ns         2986

So the result is similar to SpinLockMutex: the higher the contention, the less efficient atomics are, and the better std::mutex performs.

Reneg973 · 2025-10-26T09:38:37Z

And a compiler fix for libc (MacOS) is also required, because size_t is of type unsigned long here:

   uint64_t max_check = std::min<uint64_t>(current_count, capacity_);

add circular buffer using faa instead of cas

1836bbf

marcalff added the pr:waiting-on-cla Waiting on CLA label Sep 17, 2025

ZENOTME mentioned this pull request Sep 17, 2025

New circular buffer using faa instead of cas to optimize performance #3645

Open

marcalff removed the pr:waiting-on-cla Waiting on CLA label Sep 18, 2025

owent reviewed Sep 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

New circular buffer using faa instead of cas to optimize performance #3644

New circular buffer using faa instead of cas to optimize performance #3644

Uh oh!

ZENOTME commented Sep 17, 2025 •

edited

Loading

Uh oh!

linux-foundation-easycla bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

owent Sep 19, 2025

Uh oh!

ZENOTME Sep 19, 2025

Uh oh!

owent Sep 20, 2025 •

edited

Loading

Uh oh!

ZENOTME Sep 20, 2025

Uh oh!

Reneg973 Oct 25, 2025

Uh oh!

Reneg973 commented Oct 26, 2025 •

edited

Loading

Uh oh!

Reneg973 commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

New circular buffer using faa instead of cas to optimize performance #3644

Are you sure you want to change the base?

New circular buffer using faa instead of cas to optimize performance #3644

Uh oh!

Conversation

ZENOTME commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

linux-foundation-easycla bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

owent Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

ZENOTME Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

owent Sep 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZENOTME Sep 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reneg973 Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reneg973 commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reneg973 commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZENOTME commented Sep 17, 2025 •

edited

Loading

linux-foundation-easycla bot commented Sep 17, 2025 •

edited

Loading

codecov bot commented Sep 17, 2025 •

edited

Loading

owent Sep 20, 2025 •

edited

Loading

Reneg973 commented Oct 26, 2025 •

edited

Loading