opentelemetry: logs: json decoding to msgpack perf improvement by edsiper · Pull Request #11430 · fluent/fluent-bit

edsiper · 2026-02-04T19:53:43Z

Do not use a temporary buffer to process the JSON decoding and encoding to msgpack, this is a little performance improvement.

before

sudo perf record -g bin/flb-bench-opentelemetry -f ../benchmarks/otlp_logs_sample.json -m otlp-json-logs -i 10000
Benchmarking OTLP-JSON logs encoding (flb_opentelemetry_logs_json_to_msgpack)
------------------------------------------------------------------------

Iterations   : 10000
JSON size    : 5891 bytes
------------------------------------------------------------------------

------------------------------------------------------------------------
Total time   : 743449937 ns
Per call     : 74344 ns
Throughput   : 75.57 MiB/s
------------------------------------------------------------------------

after

sudo perf record -g bin/flb-bench-opentelemetry -f ../benchmarks/otlp_logs_sample.json -m otlp-json-logs -i 10000
Benchmarking OTLP-JSON logs encoding (flb_opentelemetry_logs_json_to_msgpack)
------------------------------------------------------------------------

Iterations   : 10000
JSON size    : 5891 bytes
------------------------------------------------------------------------

------------------------------------------------------------------------
Total time   : 747345444 ns
Per call     : 74734 ns
Throughput   : 75.17 MiB/s
------------------------------------------------------------------------

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

Release Notes

New Features
- Added OpenTelemetry OTLP-JSON logs encoding benchmark utility with command-line interface for performance analysis, including per-call latency and throughput measurements.
Bug Fixes
- Improved error handling and encoder state management in OpenTelemetry logs processing for enhanced reliability and data integrity.

- Encode OTLP-JSON logs directly into the caller's encoder instead of a local encoder and copying the result, removing the final buffer copy in flb_opentelemetry_logs_json_to_msgpack. - In process_json_payload_resource_logs_entry, build each scope group in the main encoder and rollback by truncating the buffer when no log records are added or on error, instead of using a temporary encoder and appending its buffer. This removes per-scope allocations and the msgpack_sbuffer_write copy. - On failure, restore encoder->buffer.size and output pointers so the encoder is left unchanged. - Add return checks for flb_log_event_encoder_group_init, group_header_end, and group_end. - Require a non-NULL encoder (callers already pass a valid encoder). Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>

coderabbitai · 2026-02-04T19:54:15Z

📝 Walkthrough

Walkthrough

This pull request introduces a new OpenTelemetry OTLP-JSON logs benchmarking tool and refactors the encoder logic in the existing OpenTelemetry logs module. The benchmark program measures encoding performance using Fluent Bit APIs, while the encoder refactoring replaces temporary encoder usage with direct use of the provided encoder parameter, adding robust error handling and rollback capabilities.

Changes

Cohort / File(s)	Summary
Build Configuration `benchmarks/CMakeLists.txt`	Added new CMake build target `flb-bench-opentelemetry` linking the opentelemetry.c executable against fluent-bit-static and thread libraries.
OpenTelemetry Benchmarking `benchmarks/opentelemetry.c`	Introduced standalone benchmark program for OTLP-JSON logs encoding with command-line interface (modes, input file, iterations), monotonic clock-based timing, and throughput/latency reporting.
Encoder Refactoring `src/opentelemetry/flb_opentelemetry_logs.c`	Replaced temporary encoder pattern with direct use of provided encoder; added explicit group handling, encoder state tracking (size_before_group, size_after_header), and comprehensive error handling with rollback and buffer pointer restoration across two major code paths.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

PR #10767: Modifies the same OpenTelemetry logs encoder file with encoding/serialization logic changes, likely related to encoder architecture improvements.

Suggested reviewers

fujimotos
koleini

Poem

🐰 A benchmark hops through JSON fields so bright,
Measuring OTLP encodings with temporal might,
While encoders refactor to flow direct and true,
With rollback safety nets—quite a neat debut! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: removing temporary buffer usage during JSON-to-msgpack encoding for OTLP logs to improve performance, which is the core objective across all three modified files.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch opentelemetry-logs-json-remove-extra-copy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/opentelemetry/flb_opentelemetry_logs.c (1)

563-604: ⚠️ Potential issue | 🟡 Minor

Inconsistent return value checking for encoder functions.

The return value of flb_log_event_encoder_append_metadata_values is checked at line 551, but similar calls like flb_log_event_encoder_append_body_string (line 563), flb_log_event_encoder_body_begin_map (line 566), and flb_log_event_encoder_append_body_values (line 591) do not check their return values. If any of these fail, the encoder state could be corrupted and the subsequent rollback may not properly restore consistency.

Consider checking return values for all encoder operations within the rollback-protected region, or document why certain calls are safe to ignore.

🤖 Fix all issues with AI agents

In `@benchmarks/opentelemetry.c`:
- Around line 158-162: The buffer returned by flb_utils_read_file (allocated via
flb_calloc inside flb_utils_read_file_offset) is being deallocated with
free(json) in the function handling the JSON buffer; replace both occurrences of
free(json) with flb_free(json) to use Fluent Bit's allocator/deallocator pair
and keep memory management consistent (look for the variable json in the
function that calls flb_utils_read_file and update its two free(json) calls to
flb_free(json)).

🧹 Nitpick comments (3)

src/opentelemetry/flb_opentelemetry_logs.c (1)
555-559: Consider extracting rollback logic into a helper macro or inline function.

The encoder state restoration pattern (resetting buffer.size, output_buffer, output_length) is repeated in ~8 places throughout this function. While functionally correct, this creates maintenance risk if the encoder's internal structure changes.

A helper macro could reduce duplication:
`#define` ENCODER_ROLLBACK_TO(enc, saved_size) do { \
    (enc)->buffer.size = (saved_size);            \
    (enc)->output_buffer = (enc)->buffer.data;    \
    (enc)->output_length = (enc)->buffer.size;    \
} while (0)
benchmarks/opentelemetry.c (2)
39-42: Potential overflow in diff_ns calculation.

The calculation (e->tv_sec - s->tv_sec) * 1000000000L can overflow if the time difference exceeds ~2 seconds on 32-bit systems where long is 32 bits. While this is unlikely to be an issue on modern 64-bit systems where this benchmark will run, using explicit int64_t or long long would be more portable.
Suggested improvement
-static long diff_ns(struct timespec *s, struct timespec *e)
+static int64_t diff_ns(struct timespec *s, struct timespec *e)
 {
-    return (e->tv_sec - s->tv_sec) * 1000000000L + (e->tv_nsec - s->tv_nsec);
+    return (int64_t)(e->tv_sec - s->tv_sec) * 1000000000LL + (e->tv_nsec - s->tv_nsec);
 }
124-130: Consider using strtol for safer integer parsing.

atoi provides no error detection for invalid input (e.g., "abc" returns 0, which would be caught, but "10abc" returns 10 and passes silently). While this is acceptable for a benchmark tool, using strtol with endptr validation would be more robust.

coderabbitai · 2026-02-04T19:58:55Z

benchmarks/opentelemetry.c

+        free(json);
+        return 1;
+    }
+
+    free(json);


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for flb_utils_read_file implementation rg -n 'int flb_utils_read_file|flb_utils_read_file\s*\(' src/flb_utils.c -A 20 | head -80

Repository: fluent/fluent-bit

Length of output: 1257

🏁 Script executed:

#!/bin/bash # Also check the function signature and how out_buf is allocated rg -n 'out_buf\s*=' src/flb_utils.c -B 3 -A 1 | head -50

Repository: fluent/fluent-bit

Length of output: 250

🏁 Script executed:

#!/bin/bash # Search for how buf is allocated in flb_utils_read_file_offset rg -n 'flb_utils_read_file_offset' src/flb_utils.c -A 80 | grep -E 'flb_sds|buf\s*=' | head -20

Repository: fluent/fluent-bit

Length of output: 158

🏁 Script executed:

#!/bin/bash # Search for flb_sds_t type definition and typical deallocation patterns rg -n 'flb_sds_t|flb_sds_destroy|flb_sds_free' --type-list | head -5 && rg 'flb_sds_destroy|flb_sds_free' -n | head -10

Repository: fluent/fluent-bit

Length of output: 1084

🏁 Script executed:

#!/antml:base64 # Check the opentelemetry.c file to see the full context of how json is used cat -n benchmarks/opentelemetry.c | sed -n '140,170p'

Repository: fluent/fluent-bit

Length of output: 843

🏁 Script executed:

#!/bin/bash # Check flb_calloc and flb_free definitions rg -n 'define flb_calloc|define flb_free|^\s*flb_calloc|^\s*flb_free' src/flb_mem.h -A 2 | head -30

Repository: fluent/fluent-bit

Length of output: 114

🏁 Script executed:

#!/bin/bash # Check how flb_utils_read_file results are freed in other parts of the codebase rg -n 'flb_utils_read_file' src -A 5 | grep -E 'flb_utils_read_file|free\(|flb_free\(' | head -30

Repository: fluent/fluent-bit

Length of output: 450

🏁 Script executed:

#!/bin/bash # Find flb_calloc and flb_free definitions fd -e h -e c | xargs rg -l 'flb_calloc|flb_free' | head -20

Repository: fluent/fluent-bit

Length of output: 759

🏁 Script executed:

#!/bin/bash # Search for flb_calloc definition more broadly rg -n 'flb_calloc' -t c -t h | grep -E 'define|^[^:]*:\s*(static\s+)?(inline\s+)?.*flb_calloc' | head -20

Repository: fluent/fluent-bit

Length of output: 1927

🏁 Script executed:

#!/bin/bash # Look at include/fluent_bit.h or similar find . -name "*.h" -path "*/include/*" | head -20

Repository: fluent/fluent-bit

Length of output: 860

🏁 Script executed:

#!/bin/bash # Search for flb_calloc and flb_free definitions in include directory rg -n 'flb_calloc|flb_free' include/fluent-bit/ | head -30

Repository: fluent/fluent-bit

Length of output: 2465

🏁 Script executed:

#!/bin/bash # Look for memory management header file find include -name "*mem*" -o -name "*alloc*"

Repository: fluent/fluent-bit

Length of output: 89

🏁 Script executed:

#!/antml:base64 # Directly check the main fluent-bit.h header head -100 include/fluent-bit.h | cat -n

Repository: fluent/fluent-bit

Length of output: 1804

🏁 Script executed:

#!/bin/bash # View flb_mem.h to see the definitions of flb_calloc and flb_free cat -n include/fluent-bit/flb_mem.h | sed -n '80,135p'

Repository: fluent/fluent-bit

Length of output: 1407

Use flb_free() for consistency with Fluent Bit's memory management patterns.

The buffer returned by flb_utils_read_file is freed with free() at lines 158 and 162. Since the buffer is allocated internally using flb_calloc (in flb_utils_read_file_offset), it should be freed with flb_free() instead to maintain consistency with Fluent Bit's memory management conventions throughout the codebase.

🤖 Prompt for AI Agents

In `@benchmarks/opentelemetry.c` around lines 158 - 162, The buffer returned by flb_utils_read_file (allocated via flb_calloc inside flb_utils_read_file_offset) is being deallocated with free(json) in the function handling the JSON buffer; replace both occurrences of free(json) with flb_free(json) to use Fluent Bit's allocator/deallocator pair and keep memory management consistent (look for the variable json in the function that calls flb_utils_read_file and update its two free(json) calls to flb_free(json)).

edsiper added 2 commits February 4, 2026 11:35

benchmarks: add opentelemetry perf test

9b47af3

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>

edsiper requested a review from cosmo0920 as a code owner February 4, 2026 19:53

edsiper added this to the Fluent Bit v5.0 milestone Feb 4, 2026

github-actions bot added the docs-required label Feb 4, 2026

edsiper temporarily deployed to pr February 4, 2026 19:54 — with GitHub Actions Inactive

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

edsiper temporarily deployed to pr February 4, 2026 20:14 — with GitHub Actions Inactive

edsiper had a problem deploying to pr February 4, 2026 20:14 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opentelemetry: logs: json decoding to msgpack perf improvement#11430

opentelemetry: logs: json decoding to msgpack perf improvement#11430
edsiper wants to merge 2 commits intomasterfrom
opentelemetry-logs-json-remove-extra-copy

edsiper commented Feb 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

edsiper commented Feb 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

edsiper commented Feb 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 4, 2026 •

edited

Loading