filter_log_to_metrics: Use optimized memory allocations#11414
filter_log_to_metrics: Use optimized memory allocations#11414
Conversation
📝 WalkthroughWalkthroughPre-allocates and manages label runtime structures for log_to_metrics: adds label counting and preparation, creates/destroys pre-built record accessors and a contiguous label-values buffer, shifts label resolution to init-time accessors, consolidates emitter aliasing, and centralizes init/cleanup flows. (29 words) Changes
Sequence Diagram(s)sequenceDiagram
participant Init as Filter Init
participant Count as count_labels()
participant Prep as prepare_label_runtime()
participant RA as RecordAccessors
participant Emitter as EmitterSetup
participant Filter as cb_log_to_metrics_filter
Init->>Count: compute label_counter (+ k8s_count)
Count-->>Init: return counts
Init->>Prep: allocate label_values_buf, label_values, label_ras
Prep->>RA: create per-label record accessors (k8s first)
RA-->>Prep: label_ras[]
Prep-->>Init: runtime structures ready
Init->>Emitter: derive/apply emitter alias and configure emitter
Note right of Filter: Runtime filter processing
Filter->>RA: use pre-created label_ras to extract values into label_values
RA-->>Filter: populated ctx->label_values[]
Filter->>Emitter: emit metrics with populated labels
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@plugins/filter_log_to_metrics/log_to_metrics.c`:
- Around line 810-821: The code assigns the result of flb_sds_printf directly to
emitter_alias_tmp which can return NULL and cause the original SDS to leak;
instead, call flb_sds_printf into a temporary pointer (e.g., tmp), check if tmp
is NULL, and if so call flb_sds_destroy(emitter_alias_tmp), flb_errno(),
log_to_metrics_destroy(ctx) and return -1; on success assign emitter_alias_tmp =
tmp. This uses the existing symbols emitter_alias_tmp, flb_sds_create_size,
flb_sds_printf, flb_sds_destroy and preserves current error handling via
log_to_metrics_destroy(ctx).
13732cf to
b9f17cb
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
plugins/filter_log_to_metrics/log_to_metrics.c (2)
360-475:⚠️ Potential issue | 🟡 MinorTighten the bounds guard before indexing label arrays.
If the computed total and fill pass ever diverge (e.g., config mutation or unexpected properties), the current>check can still allow one out-of-bounds write before the final mismatch check. Use>=to fail fast before indexing.🛠️ Proposed fix
- if (counter > ctx->label_counter) { + if (counter >= ctx->label_counter) { flb_plg_error(ctx->ins, "internal label counter overflow"); return -1; }
956-1079:⚠️ Potential issue | 🟠 MajorConfirm thread-safety vulnerability in shared
ctx->label_valuesbuffer.The
label_valuesbuffer is allocated once per filter instance and reused across all concurrent invocations from multiple input sources. Since Fluent Bit has input worker threads that independently process chunks and invoke filters (viaflb_filter_do), multiple workers can simultaneously callcb_log_to_metrics_filterwith the samectx. The vulnerable window is between writing tolabel_values(lines 982–999) and passing it tocmt_counter_inc,cmt_gauge_set, orcmt_histogram_observe(lines 1009, 1030, 1051). A concurrent writer can corrupt label values mid-operation.Locking exists on chunks and tasks but not on filter instances or their context. To fix: either allocate
label_valuesper-invocation (stack or local scope), use thread-local storage, add a mutex around the vulnerable window, or buffer label values before the cmt call completes.
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
…llocations Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/filter_log_to_metrics/log_to_metrics.c (1)
418-496:⚠️ Potential issue | 🟠 MajorRoute all
set_labelsfailures through cleanup.After
ctx->label_keys/ctx->label_accessorsare allocated, several error paths return-1directly (Line 423-425, Line 440-443, Line 461-465, Line 492-496).cb_log_to_metrics_init()returns immediately onset_labelsfailure, so these leak. Usegoto errorconsistently after allocation.🔧 Proposed fix
for (i = 0; i < NUMBER_OF_KUBERNETES_LABELS; i++) { ctx->label_keys[i] = flb_strdup(kubernetes_label_keys[i]); if (!ctx->label_keys[i]) { flb_errno(); - return -1; + goto error; } ctx->label_accessors[i] = NULL; } counter = NUMBER_OF_KUBERNETES_LABELS; } mk_list_foreach(head, &f_ins->properties) { kv = mk_list_entry(head, struct flb_kv, _head); - if (counter > ctx->label_counter) { + if (counter >= ctx->label_counter) { flb_plg_error(ctx->ins, "internal label counter overflow"); - return -1; + goto error; } if (strcasecmp(kv->key, "label_field") == 0) { ctx->label_keys[counter] = flb_strdup(kv->val); if (!ctx->label_keys[counter]) { flb_errno(); goto error; } ctx->label_accessors[counter] = flb_strdup(kv->val); if (!ctx->label_accessors[counter]) { flb_errno(); goto error; } counter++; } else if (strcasecmp(kv->key, "add_label") == 0) { split = flb_utils_split(kv->val, ' ', 1); if (mk_list_size(split) != 2) { flb_plg_error(ctx->ins, "invalid label, expected name and key"); flb_utils_split_free(split); - return -1; + goto error; } ... } } if (counter != ctx->label_counter) { flb_plg_error(ctx->ins, "label count mismatch: computed=%d filled=%d", ctx->label_counter, counter); - return -1; + goto error; }
🤖 Fix all issues with AI agents
In `@plugins/filter_log_to_metrics/log_to_metrics.c`:
- Around line 168-185: In prepare_label_runtime's error cleanup, do not free
each ctx->label_values[i] because those entries point into the single contiguous
buffer ctx->label_values_buf; instead remove the loop that
flb_free(ctx->label_values[i]) and only free ctx->label_values_buf and the
ctx->label_values pointer array itself (and set them NULL). Keep the existing
cleanup for ctx->label_accessors (free each ctx->label_accessors[i] then
flb_free(ctx->label_accessors) and NULL it) and ensure you still respect
ctx->label_counter when releasing arrays.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@plugins/filter_log_to_metrics/log_to_metrics.c`:
- Around line 437-440: The overflow check currently uses "counter >
ctx->label_counter" which permits counter == ctx->label_counter and can produce
an out-of-bounds write; change the check to reject that case by using "counter
>= ctx->label_counter" (or otherwise compare against the max valid index such as
ctx->label_counter - 1) and keep the existing flb_plg_error(ctx->ins, ...) and
return -1 behavior so any attempt to index arrays using counter is prevented.
🧹 Nitpick comments (1)
plugins/filter_log_to_metrics/log_to_metrics.c (1)
1022-1025: Consider usingPRId64for portable 64-bit integer formatting.The cast to
longcould truncate large 64-bit values on 32-bit platforms wherelongis 32 bits. For full portability, consider usingPRId64from<inttypes.h>.♻️ Suggested improvement
+#include <inttypes.h> ... else if (rval->type == FLB_RA_INT) { - snprintf(ctx->label_values[i], MAX_LABEL_LENGTH - 1, "%ld", - (long) rval->val.i64); + snprintf(ctx->label_values[i], MAX_LABEL_LENGTH - 1, "%" PRId64, + rval->val.i64); }
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@plugins/filter_log_to_metrics/log_to_metrics.c`:
- Around line 438-441: Several error paths in the label parsing code (e.g.,
inside the label_field and add_label handling) return -1 directly and leak the
allocated arrays label_keys and label_accessors; change those direct returns
(the overflow checks, invalid label split, and counter mismatch checks) to jump
to the existing error cleanup path using goto error so that label_keys and
label_accessors are freed and ctx->ins error logging is preserved; update the
checks around label_field, add_label, the split validation, and the counter
mismatch to use goto error instead of return -1.
- Around line 168-184: The error cleanup path in the function fails to destroy
and free any partially-created record accessors stored in ctx->label_ras,
causing leaks; update the error block to iterate over the successfully created
entries (0..ctx->label_counter-1) and call flb_ra_destroy on each non-NULL
ctx->label_ras[i], then free ctx->label_ras and set it to NULL (similar to how
ctx->label_accessors is handled), and ensure ctx->label_counter is
handled/cleared as needed before returning -1.
- Around line 419-422: The direct return after flb_strdup failure leaks
allocated resources; replace the immediate "return -1" with "goto error" so the
existing cleanup path runs (preserve the flb_errno() call), ensuring
ctx->label_keys and ctx->label_accessors are freed by the error handler at the
function's "error" label (update any nearby error label if needed to free these
arrays).
1ffa521 to
65ff9c8
Compare
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
65ff9c8 to
1530d6a
Compare
|
thanks @cosmo0920 ! pls cleanup the commit history so we can get this merged for v5 |
Currently, filter_log_to_metrics frequently allocates heap memory.
This causes memory fragmentation and take a longer time to allocate memory which corresponds to running period.
Instead, we need to optimize this kind of heap memory allocations and suppress CPU stale for waiting I/O operations for memory.
Before
After
Call stack is simplified and the main difference is:
vs
So, we achieved to create optimized version of filter_log_to_metrics plugin for preventing fragmented heap regions.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
With tons of label_fields, there is no memory leaks:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Bug Fixes