opentelemetry: logs: json decoding to msgpack perf improvement#11430
opentelemetry: logs: json decoding to msgpack perf improvement#11430
Conversation
- Encode OTLP-JSON logs directly into the caller's encoder instead of a local encoder and copying the result, removing the final buffer copy in flb_opentelemetry_logs_json_to_msgpack. - In process_json_payload_resource_logs_entry, build each scope group in the main encoder and rollback by truncating the buffer when no log records are added or on error, instead of using a temporary encoder and appending its buffer. This removes per-scope allocations and the msgpack_sbuffer_write copy. - On failure, restore encoder->buffer.size and output pointers so the encoder is left unchanged. - Add return checks for flb_log_event_encoder_group_init, group_header_end, and group_end. - Require a non-NULL encoder (callers already pass a valid encoder). Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
📝 WalkthroughWalkthroughThis pull request introduces a new OpenTelemetry OTLP-JSON logs benchmarking tool and refactors the encoder logic in the existing OpenTelemetry logs module. The benchmark program measures encoding performance using Fluent Bit APIs, while the encoder refactoring replaces temporary encoder usage with direct use of the provided encoder parameter, adding robust error handling and rollback capabilities. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/opentelemetry/flb_opentelemetry_logs.c (1)
563-604:⚠️ Potential issue | 🟡 MinorInconsistent return value checking for encoder functions.
The return value of
flb_log_event_encoder_append_metadata_valuesis checked at line 551, but similar calls likeflb_log_event_encoder_append_body_string(line 563),flb_log_event_encoder_body_begin_map(line 566), andflb_log_event_encoder_append_body_values(line 591) do not check their return values. If any of these fail, the encoder state could be corrupted and the subsequent rollback may not properly restore consistency.Consider checking return values for all encoder operations within the rollback-protected region, or document why certain calls are safe to ignore.
🤖 Fix all issues with AI agents
In `@benchmarks/opentelemetry.c`:
- Around line 158-162: The buffer returned by flb_utils_read_file (allocated via
flb_calloc inside flb_utils_read_file_offset) is being deallocated with
free(json) in the function handling the JSON buffer; replace both occurrences of
free(json) with flb_free(json) to use Fluent Bit's allocator/deallocator pair
and keep memory management consistent (look for the variable json in the
function that calls flb_utils_read_file and update its two free(json) calls to
flb_free(json)).
🧹 Nitpick comments (3)
src/opentelemetry/flb_opentelemetry_logs.c (1)
555-559: Consider extracting rollback logic into a helper macro or inline function.The encoder state restoration pattern (resetting
buffer.size,output_buffer,output_length) is repeated in ~8 places throughout this function. While functionally correct, this creates maintenance risk if the encoder's internal structure changes.A helper macro could reduce duplication:
`#define` ENCODER_ROLLBACK_TO(enc, saved_size) do { \ (enc)->buffer.size = (saved_size); \ (enc)->output_buffer = (enc)->buffer.data; \ (enc)->output_length = (enc)->buffer.size; \ } while (0)benchmarks/opentelemetry.c (2)
39-42: Potential overflow in diff_ns calculation.The calculation
(e->tv_sec - s->tv_sec) * 1000000000Lcan overflow if the time difference exceeds ~2 seconds on 32-bit systems wherelongis 32 bits. While this is unlikely to be an issue on modern 64-bit systems where this benchmark will run, using explicitint64_torlong longwould be more portable.Suggested improvement
-static long diff_ns(struct timespec *s, struct timespec *e) +static int64_t diff_ns(struct timespec *s, struct timespec *e) { - return (e->tv_sec - s->tv_sec) * 1000000000L + (e->tv_nsec - s->tv_nsec); + return (int64_t)(e->tv_sec - s->tv_sec) * 1000000000LL + (e->tv_nsec - s->tv_nsec); }
124-130: Consider using strtol for safer integer parsing.
atoiprovides no error detection for invalid input (e.g., "abc" returns 0, which would be caught, but "10abc" returns 10 and passes silently). While this is acceptable for a benchmark tool, usingstrtolwithendptrvalidation would be more robust.
| free(json); | ||
| return 1; | ||
| } | ||
|
|
||
| free(json); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for flb_utils_read_file implementation
rg -n 'int flb_utils_read_file|flb_utils_read_file\s*\(' src/flb_utils.c -A 20 | head -80Repository: fluent/fluent-bit
Length of output: 1257
🏁 Script executed:
#!/bin/bash
# Also check the function signature and how out_buf is allocated
rg -n 'out_buf\s*=' src/flb_utils.c -B 3 -A 1 | head -50Repository: fluent/fluent-bit
Length of output: 250
🏁 Script executed:
#!/bin/bash
# Search for how buf is allocated in flb_utils_read_file_offset
rg -n 'flb_utils_read_file_offset' src/flb_utils.c -A 80 | grep -E 'flb_sds|buf\s*=' | head -20Repository: fluent/fluent-bit
Length of output: 158
🏁 Script executed:
#!/bin/bash
# Search for flb_sds_t type definition and typical deallocation patterns
rg -n 'flb_sds_t|flb_sds_destroy|flb_sds_free' --type-list | head -5 && rg 'flb_sds_destroy|flb_sds_free' -n | head -10Repository: fluent/fluent-bit
Length of output: 1084
🏁 Script executed:
#!/antml:base64
# Check the opentelemetry.c file to see the full context of how json is used
cat -n benchmarks/opentelemetry.c | sed -n '140,170p'Repository: fluent/fluent-bit
Length of output: 843
🏁 Script executed:
#!/bin/bash
# Check flb_calloc and flb_free definitions
rg -n 'define flb_calloc|define flb_free|^\s*flb_calloc|^\s*flb_free' src/flb_mem.h -A 2 | head -30Repository: fluent/fluent-bit
Length of output: 114
🏁 Script executed:
#!/bin/bash
# Check how flb_utils_read_file results are freed in other parts of the codebase
rg -n 'flb_utils_read_file' src -A 5 | grep -E 'flb_utils_read_file|free\(|flb_free\(' | head -30Repository: fluent/fluent-bit
Length of output: 450
🏁 Script executed:
#!/bin/bash
# Find flb_calloc and flb_free definitions
fd -e h -e c | xargs rg -l 'flb_calloc|flb_free' | head -20Repository: fluent/fluent-bit
Length of output: 759
🏁 Script executed:
#!/bin/bash
# Search for flb_calloc definition more broadly
rg -n 'flb_calloc' -t c -t h | grep -E 'define|^[^:]*:\s*(static\s+)?(inline\s+)?.*flb_calloc' | head -20Repository: fluent/fluent-bit
Length of output: 1927
🏁 Script executed:
#!/bin/bash
# Look at include/fluent_bit.h or similar
find . -name "*.h" -path "*/include/*" | head -20Repository: fluent/fluent-bit
Length of output: 860
🏁 Script executed:
#!/bin/bash
# Search for flb_calloc and flb_free definitions in include directory
rg -n 'flb_calloc|flb_free' include/fluent-bit/ | head -30Repository: fluent/fluent-bit
Length of output: 2465
🏁 Script executed:
#!/bin/bash
# Look for memory management header file
find include -name "*mem*" -o -name "*alloc*"Repository: fluent/fluent-bit
Length of output: 89
🏁 Script executed:
#!/antml:base64
# Directly check the main fluent-bit.h header
head -100 include/fluent-bit.h | cat -nRepository: fluent/fluent-bit
Length of output: 1804
🏁 Script executed:
#!/bin/bash
# View flb_mem.h to see the definitions of flb_calloc and flb_free
cat -n include/fluent-bit/flb_mem.h | sed -n '80,135p'Repository: fluent/fluent-bit
Length of output: 1407
Use flb_free() for consistency with Fluent Bit's memory management patterns.
The buffer returned by flb_utils_read_file is freed with free() at lines 158 and 162. Since the buffer is allocated internally using flb_calloc (in flb_utils_read_file_offset), it should be freed with flb_free() instead to maintain consistency with Fluent Bit's memory management conventions throughout the codebase.
🤖 Prompt for AI Agents
In `@benchmarks/opentelemetry.c` around lines 158 - 162, The buffer returned by
flb_utils_read_file (allocated via flb_calloc inside flb_utils_read_file_offset)
is being deallocated with free(json) in the function handling the JSON buffer;
replace both occurrences of free(json) with flb_free(json) to use Fluent Bit's
allocator/deallocator pair and keep memory management consistent (look for the
variable json in the function that calls flb_utils_read_file and update its two
free(json) calls to flb_free(json)).
Do not use a temporary buffer to process the JSON decoding and encoding to msgpack, this is a little performance improvement.
before
after
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes