Skip to content

opentelemetry: logs: json decoding to msgpack perf improvement#11430

Open
edsiper wants to merge 2 commits intomasterfrom
opentelemetry-logs-json-remove-extra-copy
Open

opentelemetry: logs: json decoding to msgpack perf improvement#11430
edsiper wants to merge 2 commits intomasterfrom
opentelemetry-logs-json-remove-extra-copy

Conversation

@edsiper
Copy link
Member

@edsiper edsiper commented Feb 4, 2026

Do not use a temporary buffer to process the JSON decoding and encoding to msgpack, this is a little performance improvement.

before

sudo perf record -g bin/flb-bench-opentelemetry -f ../benchmarks/otlp_logs_sample.json -m otlp-json-logs -i 10000
Benchmarking OTLP-JSON logs encoding (flb_opentelemetry_logs_json_to_msgpack)
------------------------------------------------------------------------

Iterations   : 10000
JSON size    : 5891 bytes
------------------------------------------------------------------------

------------------------------------------------------------------------
Total time   : 743449937 ns
Per call     : 74344 ns
Throughput   : 75.57 MiB/s
------------------------------------------------------------------------
image

after

sudo perf record -g bin/flb-bench-opentelemetry -f ../benchmarks/otlp_logs_sample.json -m otlp-json-logs -i 10000
Benchmarking OTLP-JSON logs encoding (flb_opentelemetry_logs_json_to_msgpack)
------------------------------------------------------------------------

Iterations   : 10000
JSON size    : 5891 bytes
------------------------------------------------------------------------

------------------------------------------------------------------------
Total time   : 747345444 ns
Per call     : 74734 ns
Throughput   : 75.17 MiB/s
------------------------------------------------------------------------
image

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added OpenTelemetry OTLP-JSON logs encoding benchmark utility with command-line interface for performance analysis, including per-call latency and throughput measurements.
  • Bug Fixes

    • Improved error handling and encoder state management in OpenTelemetry logs processing for enhanced reliability and data integrity.

- Encode OTLP-JSON logs directly into the caller's encoder instead of a
  local encoder and copying the result, removing the final buffer copy
  in flb_opentelemetry_logs_json_to_msgpack.

- In process_json_payload_resource_logs_entry, build each scope group
  in the main encoder and rollback by truncating the buffer when no
  log records are added or on error, instead of using a temporary
  encoder and appending its buffer. This removes per-scope allocations
  and the msgpack_sbuffer_write copy.

- On failure, restore encoder->buffer.size and output pointers so the
  encoder is left unchanged.

- Add return checks for flb_log_event_encoder_group_init,
  group_header_end, and group_end.

- Require a non-NULL encoder (callers already pass a valid encoder).

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
@coderabbitai
Copy link

coderabbitai bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a new OpenTelemetry OTLP-JSON logs benchmarking tool and refactors the encoder logic in the existing OpenTelemetry logs module. The benchmark program measures encoding performance using Fluent Bit APIs, while the encoder refactoring replaces temporary encoder usage with direct use of the provided encoder parameter, adding robust error handling and rollback capabilities.

Changes

Cohort / File(s) Summary
Build Configuration
benchmarks/CMakeLists.txt
Added new CMake build target flb-bench-opentelemetry linking the opentelemetry.c executable against fluent-bit-static and thread libraries.
OpenTelemetry Benchmarking
benchmarks/opentelemetry.c
Introduced standalone benchmark program for OTLP-JSON logs encoding with command-line interface (modes, input file, iterations), monotonic clock-based timing, and throughput/latency reporting.
Encoder Refactoring
src/opentelemetry/flb_opentelemetry_logs.c
Replaced temporary encoder pattern with direct use of provided encoder; added explicit group handling, encoder state tracking (size_before_group, size_after_header), and comprehensive error handling with rollback and buffer pointer restoration across two major code paths.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • PR #10767: Modifies the same OpenTelemetry logs encoder file with encoding/serialization logic changes, likely related to encoder architecture improvements.

Suggested reviewers

  • fujimotos
  • koleini

Poem

🐰 A benchmark hops through JSON fields so bright,
Measuring OTLP encodings with temporal might,
While encoders refactor to flow direct and true,
With rollback safety nets—quite a neat debut! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: removing temporary buffer usage during JSON-to-msgpack encoding for OTLP logs to improve performance, which is the core objective across all three modified files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch opentelemetry-logs-json-remove-extra-copy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/opentelemetry/flb_opentelemetry_logs.c (1)

563-604: ⚠️ Potential issue | 🟡 Minor

Inconsistent return value checking for encoder functions.

The return value of flb_log_event_encoder_append_metadata_values is checked at line 551, but similar calls like flb_log_event_encoder_append_body_string (line 563), flb_log_event_encoder_body_begin_map (line 566), and flb_log_event_encoder_append_body_values (line 591) do not check their return values. If any of these fail, the encoder state could be corrupted and the subsequent rollback may not properly restore consistency.

Consider checking return values for all encoder operations within the rollback-protected region, or document why certain calls are safe to ignore.

🤖 Fix all issues with AI agents
In `@benchmarks/opentelemetry.c`:
- Around line 158-162: The buffer returned by flb_utils_read_file (allocated via
flb_calloc inside flb_utils_read_file_offset) is being deallocated with
free(json) in the function handling the JSON buffer; replace both occurrences of
free(json) with flb_free(json) to use Fluent Bit's allocator/deallocator pair
and keep memory management consistent (look for the variable json in the
function that calls flb_utils_read_file and update its two free(json) calls to
flb_free(json)).
🧹 Nitpick comments (3)
src/opentelemetry/flb_opentelemetry_logs.c (1)

555-559: Consider extracting rollback logic into a helper macro or inline function.

The encoder state restoration pattern (resetting buffer.size, output_buffer, output_length) is repeated in ~8 places throughout this function. While functionally correct, this creates maintenance risk if the encoder's internal structure changes.

A helper macro could reduce duplication:

`#define` ENCODER_ROLLBACK_TO(enc, saved_size) do { \
    (enc)->buffer.size = (saved_size);            \
    (enc)->output_buffer = (enc)->buffer.data;    \
    (enc)->output_length = (enc)->buffer.size;    \
} while (0)
benchmarks/opentelemetry.c (2)

39-42: Potential overflow in diff_ns calculation.

The calculation (e->tv_sec - s->tv_sec) * 1000000000L can overflow if the time difference exceeds ~2 seconds on 32-bit systems where long is 32 bits. While this is unlikely to be an issue on modern 64-bit systems where this benchmark will run, using explicit int64_t or long long would be more portable.

Suggested improvement
-static long diff_ns(struct timespec *s, struct timespec *e)
+static int64_t diff_ns(struct timespec *s, struct timespec *e)
 {
-    return (e->tv_sec - s->tv_sec) * 1000000000L + (e->tv_nsec - s->tv_nsec);
+    return (int64_t)(e->tv_sec - s->tv_sec) * 1000000000LL + (e->tv_nsec - s->tv_nsec);
 }

124-130: Consider using strtol for safer integer parsing.

atoi provides no error detection for invalid input (e.g., "abc" returns 0, which would be caught, but "10abc" returns 10 and passes silently). While this is acceptable for a benchmark tool, using strtol with endptr validation would be more robust.

Comment on lines +158 to +162
free(json);
return 1;
}

free(json);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for flb_utils_read_file implementation
rg -n 'int flb_utils_read_file|flb_utils_read_file\s*\(' src/flb_utils.c -A 20 | head -80

Repository: fluent/fluent-bit

Length of output: 1257


🏁 Script executed:

#!/bin/bash
# Also check the function signature and how out_buf is allocated
rg -n 'out_buf\s*=' src/flb_utils.c -B 3 -A 1 | head -50

Repository: fluent/fluent-bit

Length of output: 250


🏁 Script executed:

#!/bin/bash
# Search for how buf is allocated in flb_utils_read_file_offset
rg -n 'flb_utils_read_file_offset' src/flb_utils.c -A 80 | grep -E 'flb_sds|buf\s*=' | head -20

Repository: fluent/fluent-bit

Length of output: 158


🏁 Script executed:

#!/bin/bash
# Search for flb_sds_t type definition and typical deallocation patterns
rg -n 'flb_sds_t|flb_sds_destroy|flb_sds_free' --type-list | head -5 && rg 'flb_sds_destroy|flb_sds_free' -n | head -10

Repository: fluent/fluent-bit

Length of output: 1084


🏁 Script executed:

#!/antml:base64
# Check the opentelemetry.c file to see the full context of how json is used
cat -n benchmarks/opentelemetry.c | sed -n '140,170p'

Repository: fluent/fluent-bit

Length of output: 843


🏁 Script executed:

#!/bin/bash
# Check flb_calloc and flb_free definitions
rg -n 'define flb_calloc|define flb_free|^\s*flb_calloc|^\s*flb_free' src/flb_mem.h -A 2 | head -30

Repository: fluent/fluent-bit

Length of output: 114


🏁 Script executed:

#!/bin/bash
# Check how flb_utils_read_file results are freed in other parts of the codebase
rg -n 'flb_utils_read_file' src -A 5 | grep -E 'flb_utils_read_file|free\(|flb_free\(' | head -30

Repository: fluent/fluent-bit

Length of output: 450


🏁 Script executed:

#!/bin/bash
# Find flb_calloc and flb_free definitions
fd -e h -e c | xargs rg -l 'flb_calloc|flb_free' | head -20

Repository: fluent/fluent-bit

Length of output: 759


🏁 Script executed:

#!/bin/bash
# Search for flb_calloc definition more broadly
rg -n 'flb_calloc' -t c -t h | grep -E 'define|^[^:]*:\s*(static\s+)?(inline\s+)?.*flb_calloc' | head -20

Repository: fluent/fluent-bit

Length of output: 1927


🏁 Script executed:

#!/bin/bash
# Look at include/fluent_bit.h or similar
find . -name "*.h" -path "*/include/*" | head -20

Repository: fluent/fluent-bit

Length of output: 860


🏁 Script executed:

#!/bin/bash
# Search for flb_calloc and flb_free definitions in include directory
rg -n 'flb_calloc|flb_free' include/fluent-bit/ | head -30

Repository: fluent/fluent-bit

Length of output: 2465


🏁 Script executed:

#!/bin/bash
# Look for memory management header file
find include -name "*mem*" -o -name "*alloc*"

Repository: fluent/fluent-bit

Length of output: 89


🏁 Script executed:

#!/antml:base64
# Directly check the main fluent-bit.h header
head -100 include/fluent-bit.h | cat -n

Repository: fluent/fluent-bit

Length of output: 1804


🏁 Script executed:

#!/bin/bash
# View flb_mem.h to see the definitions of flb_calloc and flb_free
cat -n include/fluent-bit/flb_mem.h | sed -n '80,135p'

Repository: fluent/fluent-bit

Length of output: 1407


Use flb_free() for consistency with Fluent Bit's memory management patterns.

The buffer returned by flb_utils_read_file is freed with free() at lines 158 and 162. Since the buffer is allocated internally using flb_calloc (in flb_utils_read_file_offset), it should be freed with flb_free() instead to maintain consistency with Fluent Bit's memory management conventions throughout the codebase.

🤖 Prompt for AI Agents
In `@benchmarks/opentelemetry.c` around lines 158 - 162, The buffer returned by
flb_utils_read_file (allocated via flb_calloc inside flb_utils_read_file_offset)
is being deallocated with free(json) in the function handling the JSON buffer;
replace both occurrences of free(json) with flb_free(json) to use Fluent Bit's
allocator/deallocator pair and keep memory management consistent (look for the
variable json in the function that calls flb_utils_read_file and update its two
free(json) calls to flb_free(json)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant