Skip to content

filter_log_to_metrics: Use optimized memory allocations#11414

Open
cosmo0920 wants to merge 9 commits intomasterfrom
cosmo0920-use-optimized-memory-allocations-on-log_to_metrics
Open

filter_log_to_metrics: Use optimized memory allocations#11414
cosmo0920 wants to merge 9 commits intomasterfrom
cosmo0920-use-optimized-memory-allocations-on-log_to_metrics

Conversation

@cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Jan 30, 2026

Currently, filter_log_to_metrics frequently allocates heap memory.
This causes memory fragmentation and take a longer time to allocate memory which corresponds to running period.
Instead, we need to optimize this kind of heap memory allocations and suppress CPU stale for waiting I/O operations for memory.

Before

Samples: 6K of event 'cpu_core/cycles/P', Event count (approx.): 77560880457, Thread: flb-pipeline
  Children      Self  Command       Shared Object         Symbol
<snip>
+   78.44%     0.06%  flb-pipeline  fluent-bit            [.] cb_log_to_metrics_filter
+   73.09%     0.34%  flb-pipeline  fluent-bit            [.] fill_labels
+   50.95%     0.48%  flb-pipeline  fluent-bit            [.] flb_ra_create
+   30.15%     0.26%  flb-pipeline  fluent-bit            [.] flb_env_create
+   16.83%     0.14%  flb-pipeline  fluent-bit            [.] flb_ra_get_value_object
+   16.40%     0.19%  flb-pipeline  fluent-bit            [.] flb_ra_key_to_value_ext
<snip>

After

Samples: 3K of event 'cpu_core/cycles/P', Event count (approx.): 14971900615
  Children      Self  Command       Shared Object         Symbol
<snip>
+   65.17%     0.84%  flb-pipeline  fluent-bit            [.] cb_log_to_metrics_filter
+   49.61%     0.53%  flb-pipeline  fluent-bit            [.] flb_ra_get_value_object
+   48.75%     1.61%  flb-pipeline  fluent-bit            [.] flb_ra_key_to_value_ext
<snip>

Call stack is simplified and the main difference is:

+   78.44%     0.06%  flb-pipeline  fluent-bit            [.] cb_log_to_metrics_filter
+   73.09%     0.34%  flb-pipeline  fluent-bit            [.] fill_labels

vs

+   65.17%     0.84%  flb-pipeline  fluent-bit            [.] cb_log_to_metrics_filter

So, we achieved to create optimized version of filter_log_to_metrics plugin for preventing fragmented heap regions.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

With tons of label_fields, there is no memory leaks:

==1363657== 
==1363657== HEAP SUMMARY:
==1363657==     in use at exit: 0 bytes in 0 blocks
==1363657==   total heap usage: 212,195 allocs, 212,195 frees, 95,802,946 bytes allocated
==1363657== 
==1363657== All heap blocks were freed -- no leaks are possible
==1363657== 
==1363657== For lists of detected and suppressed errors, rerun with: -s
==1363657== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Increased maximum label capacity (32 → 128).
    • Two-stage label setup with pre-allocated runtime label buffers and pre-created label accessors (Kubernetes-backed accessors prioritized) to reduce per-call allocations.
    • Emitter aliasing when no explicit emitter name is provided.
  • Bug Fixes

    • Stronger label validation to prevent overflows/mismatches.
    • Improved cleanup of runtime label buffers and accessors to avoid memory leaks and improve stability.

@coderabbitai
Copy link

coderabbitai bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

Pre-allocates and manages label runtime structures for log_to_metrics: adds label counting and preparation, creates/destroys pre-built record accessors and a contiguous label-values buffer, shifts label resolution to init-time accessors, consolidates emitter aliasing, and centralizes init/cleanup flows. (29 words)

Changes

Cohort / File(s) Summary
Header Structure Updates
plugins/filter_log_to_metrics/log_to_metrics.h
Added <stddef.h>, increased MAX_LABEL_COUNT (32→128), and extended struct log_to_metrics_ctx with label_ras, label_values_buf, and label_values.
Label runtime helpers & init/refactor
plugins/filter_log_to_metrics/log_to_metrics.c
Added count_labels(...) and prepare_label_runtime(...); updated set_labels() signature; refactored label setup into two-pass (count → allocate/configure) and centralized initialization.
Pre-allocated runtime storage & accessors
plugins/filter_log_to_metrics/log_to_metrics.c
Replaced per-call allocations with pre-allocated label_values_buf and label_values; introduced persistent ctx->label_ras with Kubernetes-backed accessors placed first.
Filter path adjustments
plugins/filter_log_to_metrics/log_to_metrics.c
cb_log_to_metrics_filter() now uses ctx->label_counter, ctx->label_ras and ctx->label_values to extract label values via pre-created accessors (removes per-call accessor creation and direct k8s probing).
Emitter aliasing & wiring
plugins/filter_log_to_metrics/log_to_metrics.c
Emitter alias resolution prefers explicit emitter_name; otherwise derives alias from filter name; alias is applied to emitter input config with temporary buffers during init.
Teardown & error-path cleanup
plugins/filter_log_to_metrics/log_to_metrics.c
Expanded log_to_metrics_destroy() and error paths to free/destroy label_ras, label_values, label_values_buf, and per-label keys/accessors to avoid leaks.
Removed legacy per-call logic
plugins/filter_log_to_metrics/log_to_metrics.c
Removed former per-call fill_labels and related per-call accessor construction; consolidated scattered setup into unified initialization and runtime structures.

Sequence Diagram(s)

sequenceDiagram
    participant Init as Filter Init
    participant Count as count_labels()
    participant Prep as prepare_label_runtime()
    participant RA as RecordAccessors
    participant Emitter as EmitterSetup
    participant Filter as cb_log_to_metrics_filter

    Init->>Count: compute label_counter (+ k8s_count)
    Count-->>Init: return counts
    Init->>Prep: allocate label_values_buf, label_values, label_ras
    Prep->>RA: create per-label record accessors (k8s first)
    RA-->>Prep: label_ras[]
    Prep-->>Init: runtime structures ready
    Init->>Emitter: derive/apply emitter alias and configure emitter
    Note right of Filter: Runtime filter processing
    Filter->>RA: use pre-created label_ras to extract values into label_values
    RA-->>Filter: populated ctx->label_values[]
    Filter->>Emitter: emit metrics with populated labels
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

backport to v4.1.x, backport to v4.2.x

Suggested reviewers

  • edsiper
  • koleini
  • fujimotos

Poem

🐇 I counted keys in tidy rows,
I built soft nests where value flows.
Buffers snug and accessors set,
No more leaks to make me fret.
Metrics hop out—cheers, carrot pet! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and accurately captures the main change: optimizing memory allocations in the filter_log_to_metrics module by using pre-allocated runtime structures and reducing per-call heap allocations.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cosmo0920-use-optimized-memory-allocations-on-log_to_metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@plugins/filter_log_to_metrics/log_to_metrics.c`:
- Around line 810-821: The code assigns the result of flb_sds_printf directly to
emitter_alias_tmp which can return NULL and cause the original SDS to leak;
instead, call flb_sds_printf into a temporary pointer (e.g., tmp), check if tmp
is NULL, and if so call flb_sds_destroy(emitter_alias_tmp), flb_errno(),
log_to_metrics_destroy(ctx) and return -1; on success assign emitter_alias_tmp =
tmp. This uses the existing symbols emitter_alias_tmp, flb_sds_create_size,
flb_sds_printf, flb_sds_destroy and preserves current error handling via
log_to_metrics_destroy(ctx).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
plugins/filter_log_to_metrics/log_to_metrics.c (2)

360-475: ⚠️ Potential issue | 🟡 Minor

Tighten the bounds guard before indexing label arrays.
If the computed total and fill pass ever diverge (e.g., config mutation or unexpected properties), the current > check can still allow one out-of-bounds write before the final mismatch check. Use >= to fail fast before indexing.

🛠️ Proposed fix
-        if (counter > ctx->label_counter) {
+        if (counter >= ctx->label_counter) {
             flb_plg_error(ctx->ins, "internal label counter overflow");
             return -1;
         }

956-1079: ⚠️ Potential issue | 🟠 Major

Confirm thread-safety vulnerability in shared ctx->label_values buffer.

The label_values buffer is allocated once per filter instance and reused across all concurrent invocations from multiple input sources. Since Fluent Bit has input worker threads that independently process chunks and invoke filters (via flb_filter_do), multiple workers can simultaneously call cb_log_to_metrics_filter with the same ctx. The vulnerable window is between writing to label_values (lines 982–999) and passing it to cmt_counter_inc, cmt_gauge_set, or cmt_histogram_observe (lines 1009, 1030, 1051). A concurrent writer can corrupt label values mid-operation.

Locking exists on chunks and tasks but not on filter instances or their context. To fix: either allocate label_values per-invocation (stack or local scope), use thread-local storage, add a mutex around the vulnerable window, or buffer label values before the cmt call completes.

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
…llocations

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/filter_log_to_metrics/log_to_metrics.c (1)

418-496: ⚠️ Potential issue | 🟠 Major

Route all set_labels failures through cleanup.

After ctx->label_keys / ctx->label_accessors are allocated, several error paths return -1 directly (Line 423-425, Line 440-443, Line 461-465, Line 492-496). cb_log_to_metrics_init() returns immediately on set_labels failure, so these leak. Use goto error consistently after allocation.

🔧 Proposed fix
         for (i = 0; i < NUMBER_OF_KUBERNETES_LABELS; i++) {
             ctx->label_keys[i] = flb_strdup(kubernetes_label_keys[i]);
             if (!ctx->label_keys[i]) {
                 flb_errno();
-                return -1;
+                goto error;
             }
             ctx->label_accessors[i] = NULL;
         }
         counter = NUMBER_OF_KUBERNETES_LABELS;
     }

     mk_list_foreach(head, &f_ins->properties) {
         kv = mk_list_entry(head, struct flb_kv, _head);

-        if (counter > ctx->label_counter) {
+        if (counter >= ctx->label_counter) {
             flb_plg_error(ctx->ins, "internal label counter overflow");
-            return -1;
+            goto error;
         }

         if (strcasecmp(kv->key, "label_field") == 0) {
             ctx->label_keys[counter] = flb_strdup(kv->val);
             if (!ctx->label_keys[counter]) {
                 flb_errno();
                 goto error;
             }
             ctx->label_accessors[counter] = flb_strdup(kv->val);
             if (!ctx->label_accessors[counter]) {
                 flb_errno();
                 goto error;
             }
             counter++;
         }
         else if (strcasecmp(kv->key, "add_label") == 0) {
             split = flb_utils_split(kv->val, ' ', 1);
             if (mk_list_size(split) != 2) {
                 flb_plg_error(ctx->ins, "invalid label, expected name and key");
                 flb_utils_split_free(split);
-                return -1;
+                goto error;
             }
             ...
         }
     }

     if (counter != ctx->label_counter) {
         flb_plg_error(ctx->ins,
                       "label count mismatch: computed=%d filled=%d",
                       ctx->label_counter, counter);
-        return -1;
+        goto error;
     }
🤖 Fix all issues with AI agents
In `@plugins/filter_log_to_metrics/log_to_metrics.c`:
- Around line 168-185: In prepare_label_runtime's error cleanup, do not free
each ctx->label_values[i] because those entries point into the single contiguous
buffer ctx->label_values_buf; instead remove the loop that
flb_free(ctx->label_values[i]) and only free ctx->label_values_buf and the
ctx->label_values pointer array itself (and set them NULL). Keep the existing
cleanup for ctx->label_accessors (free each ctx->label_accessors[i] then
flb_free(ctx->label_accessors) and NULL it) and ensure you still respect
ctx->label_counter when releasing arrays.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@plugins/filter_log_to_metrics/log_to_metrics.c`:
- Around line 437-440: The overflow check currently uses "counter >
ctx->label_counter" which permits counter == ctx->label_counter and can produce
an out-of-bounds write; change the check to reject that case by using "counter
>= ctx->label_counter" (or otherwise compare against the max valid index such as
ctx->label_counter - 1) and keep the existing flb_plg_error(ctx->ins, ...) and
return -1 behavior so any attempt to index arrays using counter is prevented.
🧹 Nitpick comments (1)
plugins/filter_log_to_metrics/log_to_metrics.c (1)

1022-1025: Consider using PRId64 for portable 64-bit integer formatting.

The cast to long could truncate large 64-bit values on 32-bit platforms where long is 32 bits. For full portability, consider using PRId64 from <inttypes.h>.

♻️ Suggested improvement
+#include <inttypes.h>
 ...
                     else if (rval->type == FLB_RA_INT) {
-                        snprintf(ctx->label_values[i], MAX_LABEL_LENGTH - 1, "%ld",
-                                 (long) rval->val.i64);
+                        snprintf(ctx->label_values[i], MAX_LABEL_LENGTH - 1, "%" PRId64,
+                                 rval->val.i64);
                     }

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@plugins/filter_log_to_metrics/log_to_metrics.c`:
- Around line 438-441: Several error paths in the label parsing code (e.g.,
inside the label_field and add_label handling) return -1 directly and leak the
allocated arrays label_keys and label_accessors; change those direct returns
(the overflow checks, invalid label split, and counter mismatch checks) to jump
to the existing error cleanup path using goto error so that label_keys and
label_accessors are freed and ctx->ins error logging is preserved; update the
checks around label_field, add_label, the split validation, and the counter
mismatch to use goto error instead of return -1.
- Around line 168-184: The error cleanup path in the function fails to destroy
and free any partially-created record accessors stored in ctx->label_ras,
causing leaks; update the error block to iterate over the successfully created
entries (0..ctx->label_counter-1) and call flb_ra_destroy on each non-NULL
ctx->label_ras[i], then free ctx->label_ras and set it to NULL (similar to how
ctx->label_accessors is handled), and ensure ctx->label_counter is
handled/cleared as needed before returning -1.
- Around line 419-422: The direct return after flb_strdup failure leaks
allocated resources; replace the immediate "return -1" with "goto error" so the
existing cleanup path runs (preserve the flb_errno() call), ensuring
ctx->label_keys and ctx->label_accessors are freed by the error handler at the
function's "error" label (update any nearby error label if needed to free these
arrays).

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
@edsiper
Copy link
Member

edsiper commented Feb 5, 2026

thanks @cosmo0920 !

pls cleanup the commit history so we can get this merged for v5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants