Skip to content

out_stackdriver: fix multiple memory leaks and potential corruption#11667

Open
zkdlin211 wants to merge 4 commits intofluent:masterfrom
zkdlin211:master
Open

out_stackdriver: fix multiple memory leaks and potential corruption#11667
zkdlin211 wants to merge 4 commits intofluent:masterfrom
zkdlin211:master

Conversation

@zkdlin211
Copy link
Copy Markdown

@zkdlin211 zkdlin211 commented Apr 3, 2026

  1. gce_metadata.c: Corrected a pass-by-value pointer issue in fetch_metadata. The function now accepts a pointer to the SDS pointer (flb_sds_t **payload) to ensure the caller's pointer is updated after potential reallocations, preventing use-after-free errors and memory leaks.
  2. stackdriver.c: Added a missing call to destroy_http_request in an error handling path within stackdriver_format to prevent leaking http_request resources when encountering malformed label data.
  3. stackdriver.c: Ensured that the flb_ra_value object is always destroyed in pack_resource_labels, even if the fetched value is not of type string, fixing a memory leak.

Testing
valgrind --leak-check=full --show-leak-kinds=definite --errors-for-leak-kinds=definite --error-exitcode=1 build/valgrind/bin/flb-rt-out_stackdriver --no-exec

SUCCESS: All unit tests have passed.
==2941971== 
==2941971== HEAP SUMMARY:
==2941971==     in use at exit: 0 bytes in 0 blocks
==2941971==   total heap usage: 309,822 allocs, 309,822 frees, 113,511,149 bytes allocated
==2941971== 
==2941971== All heap blocks were freed -- no leaks are possible
==2941971== 
==2941971== For lists of detected and suppressed errors, rerun with: -s
==2941971== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Fixed memory leaks in metadata and resource handling.
    • Improved error path cleanup to prevent resource accumulation.
    • Enhanced credential parsing with proper memory management.
  • Chores

    • Updated CI/CD workflow permissions for package management.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

📝 Walkthrough

Walkthrough

This pull request addresses memory management and cleanup issues in the Stackdriver plugin while also updating workflow permissions. Changes include adding missing memory cleanup in error paths, fixing a potential leak in label packing, refactoring an internal helper function signature to improve parameter handling, and preventing reallocation of unfreed credentials.

Changes

Cohort / File(s) Summary
CI/CD Permissions
.github/workflows/master-integration-test.yaml
Added packages: write permission to the master-integration-test-run-integration job to match the existing build job permissions.
Memory Management & Cleanup
plugins/out_stackdriver/gce_metadata.c
Refactored fetch_metadata internal helper to accept flb_sds_t *payload pointer parameter, requiring all call sites to pass buffer addresses (&payload) instead of values, improving parameter ownership semantics.
Error Handling Cleanup
plugins/out_stackdriver/stackdriver.c
Added resource cleanup in error paths: destroying record accessor values on label packing failure and destroying HTTP request objects on invalid payload label type errors.
Credential Parsing Safety
plugins/out_stackdriver/stackdriver_conf.c
Added cleanup of previously allocated ctx->project_id via flb_sds_destroy before assigning newly parsed project ID from credentials file to prevent memory leaks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

  • #11533: Related workflow permission adjustments in .github/workflows/master-integration-test.yaml for the integration job.

Suggested labels

docs-required, backport to v4.0.x, backport to v4.1.x

Suggested reviewers

  • braydonk

Poem

🐰 A rabbit hops through memory lanes,
Freeing buffers, breaking chains!
With pointers passed and cleanups done,
No leaks shall hide—we've caught each one! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary focus of the changeset: fixing multiple memory leaks and potential corruption in the out_stackdriver plugin.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6440aedab3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if (c->resp.status == 200) {
ret_code = 0;
flb_sds_copy(payload, c->resp.payload, c->resp.payload_size);
*payload = flb_sds_copy(*payload, c->resp.payload, c->resp.payload_size);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Propagate SDS copy allocation failures

After switching fetch_metadata to update the caller's SDS pointer, this assignment can set *payload to NULL when flb_sds_copy fails (e.g., realloc under memory pressure), but the function still leaves ret_code as success. Callers then continue on the success path and dereference payload (flb_sds_len, parsing, destroy), which can crash and also loses the original SDS pointer. Treat a NULL return from flb_sds_copy as an error and return failure from fetch_metadata.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/out_stackdriver/gce_metadata.c (1)

195-203: ⚠️ Potential issue | 🟠 Major

Free the old ctx->project_id before replacing it from metadata.

This still overwrites ctx->project_id at Line 202. Cross-file, plugins/out_stackdriver/stackdriver_conf.c Lines 136-140 can already populate it from the credentials file, and plugins/out_stackdriver/stackdriver.c Lines 1347-1351 call gce_metadata_read_project_id() later when metadata_server_auth is enabled, so the credential-derived SDS still leaks on that mixed-auth path.

♻️ Suggested fix
-    ctx->project_id = flb_sds_create(payload);
+    {
+        flb_sds_t new_project_id;
+
+        new_project_id = flb_sds_create(payload);
+        if (!new_project_id) {
+            flb_sds_destroy(payload);
+            return -1;
+        }
+
+        if (ctx->project_id) {
+            flb_sds_destroy(ctx->project_id);
+        }
+
+        ctx->project_id = new_project_id;
+    }
     flb_sds_destroy(payload);
     return 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/out_stackdriver/gce_metadata.c` around lines 195 - 203, The code that
replaces ctx->project_id after calling fetch_metadata (in
gce_metadata_read_project_id / the shown block) leaks the previous SDS; before
assigning ctx->project_id = flb_sds_create(payload) you must free any existing
ctx->project_id (check for non-NULL and call flb_sds_destroy(ctx->project_id)).
Keep the fetch_metadata error path unchanged, and only destroy the old
ctx->project_id immediately prior to creating/assigning the new SDS so the
credential-populated SDS from stackdriver_conf.c doesn't leak when
metadata_server_auth triggers this code.
🧹 Nitpick comments (1)
.github/workflows/master-integration-test.yaml (1)

30-33: Change packages: write to packages: read in the integration test job.

The called workflow call-run-integration-test.yaml only pulls images via docker pull and loads them locally with kind load docker-image. It performs no package publishing, docker pushes, or artifact uploads. Use packages: read to follow least-privilege principle.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/master-integration-test.yaml around lines 30 - 33, The
workflow currently grants excessive permissions under the permissions block by
setting packages: write; change that to packages: read to follow least-privilege
principles. Edit the permissions stanza (the permissions: contents: and
packages: entries) in the job that calls the reusable workflow (uses:
./.github/workflows/call-run-integration-test.yaml) so it only requests
packages: read instead of packages: write.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/out_stackdriver/gce_metadata.c`:
- Around line 46-55: The test-mode branches that append/copy into *payload
(calls to flb_sds_cat() for FLB_STD_METADATA_PROJECT_ID_URI,
FLB_STD_METADATA_ZONE_URI, FLB_STD_METADATA_INSTANCE_ID_URI and the
flb_sds_copy() use around line 91) must validate the return value before
overwriting *payload: store the flb_sds_cat()/flb_sds_copy() result in a
temporary variable, check for NULL, and if NULL return a non-zero error (e.g.
-1) instead of returning success; alternatively call the existing
flb_sds_cat_safe()/flb_sds_copy_safe() pattern. Update the branches handling
FLB_STD_METADATA_PROJECT_ID_URI, FLB_STD_METADATA_ZONE_URI,
FLB_STD_METADATA_INSTANCE_ID_URI (and the copy at line ~91) to follow this
pattern so allocation failures do not leave *payload NULL.

---

Outside diff comments:
In `@plugins/out_stackdriver/gce_metadata.c`:
- Around line 195-203: The code that replaces ctx->project_id after calling
fetch_metadata (in gce_metadata_read_project_id / the shown block) leaks the
previous SDS; before assigning ctx->project_id = flb_sds_create(payload) you
must free any existing ctx->project_id (check for non-NULL and call
flb_sds_destroy(ctx->project_id)). Keep the fetch_metadata error path unchanged,
and only destroy the old ctx->project_id immediately prior to creating/assigning
the new SDS so the credential-populated SDS from stackdriver_conf.c doesn't leak
when metadata_server_auth triggers this code.

---

Nitpick comments:
In @.github/workflows/master-integration-test.yaml:
- Around line 30-33: The workflow currently grants excessive permissions under
the permissions block by setting packages: write; change that to packages: read
to follow least-privilege principles. Edit the permissions stanza (the
permissions: contents: and packages: entries) in the job that calls the reusable
workflow (uses: ./.github/workflows/call-run-integration-test.yaml) so it only
requests packages: read instead of packages: write.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cc1c39a4-f71a-4320-a98c-5076754537ea

📥 Commits

Reviewing files that changed from the base of the PR and between 3e414ac and 6440aed.

📒 Files selected for processing (4)
  • .github/workflows/master-integration-test.yaml
  • plugins/out_stackdriver/gce_metadata.c
  • plugins/out_stackdriver/stackdriver.c
  • plugins/out_stackdriver/stackdriver_conf.c

Comment on lines 46 to +55
if (strcmp(uri, FLB_STD_METADATA_PROJECT_ID_URI) == 0) {
flb_sds_cat(payload, "fluent-bit-test", 15);
*payload = flb_sds_cat(*payload, "fluent-bit-test", 15);
return 0;
}
else if (strcmp(uri, FLB_STD_METADATA_ZONE_URI) == 0) {
flb_sds_cat(payload, "projects/0123456789/zones/fluent", 32);
*payload = flb_sds_cat(*payload, "projects/0123456789/zones/fluent", 32);
return 0;
}
else if (strcmp(uri, FLB_STD_METADATA_INSTANCE_ID_URI) == 0) {
flb_sds_cat(payload, "333222111", 9);
*payload = flb_sds_cat(*payload, "333222111", 9);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

sds_file="$(fd -i '^flb_sds\.c$' | head -n1)"
if [ -z "${sds_file}" ]; then
  echo "flb_sds.c not found" >&2
  exit 1
fi

echo "== flb_sds_cat / flb_sds_copy definitions =="
rg -n -C4 'flb_sds_(cat|copy)\s*\(' "$sds_file"

echo
echo "== NULL-return paths in flb_sds.c =="
rg -n -C2 'return NULL;' "$sds_file"

echo
echo "== Changed assignments in fetch_metadata =="
rg -n -C2 'flb_sds_(cat|copy)\s*\(\*payload' plugins/out_stackdriver/gce_metadata.c

echo
echo "== Downstream payload dereferences after fetch_metadata success =="
rg -n -C2 'flb_sds_len\(payload\)|flb_sds_create\(payload\)' plugins/out_stackdriver/gce_metadata.c

Repository: fluent/fluent-bit

Length of output: 4311


Handle SDS growth failures before returning success.

Lines 47, 51, 55, and 91 assign the result of flb_sds_cat() or flb_sds_copy() directly to *payload without checking for allocation failures. Both functions return NULL on reallocation failure. If this occurs, *payload becomes NULL, but the function returns 0 (success). Downstream code then dereferences the NULL pointer via flb_sds_len(payload), flb_sds_create(payload), or loop conditions, causing a crash.

The codebase already has flb_sds_cat_safe() implementing the correct pattern: validate the result before reassigning to the caller's buffer and return an error code on failure.

💡 Suggested fix pattern (from flb_sds_cat_safe)
-        if (c->resp.status == 200) {
-            ret_code = 0;
-            *payload = flb_sds_copy(*payload, c->resp.payload, c->resp.payload_size);
-        }
+        if (c->resp.status == 200) {
+            flb_sds_t tmp;
+            tmp = flb_sds_copy(*payload, c->resp.payload, c->resp.payload_size);
+            if (!tmp) {
+                flb_errno();
+                ret_code = -1;
+            }
+            else {
+                *payload = tmp;
+                ret_code = 0;
+            }
+        }

Apply the same validation to all three flb_sds_cat() calls in test mode (lines 47, 51, 55).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/out_stackdriver/gce_metadata.c` around lines 46 - 55, The test-mode
branches that append/copy into *payload (calls to flb_sds_cat() for
FLB_STD_METADATA_PROJECT_ID_URI, FLB_STD_METADATA_ZONE_URI,
FLB_STD_METADATA_INSTANCE_ID_URI and the flb_sds_copy() use around line 91) must
validate the return value before overwriting *payload: store the
flb_sds_cat()/flb_sds_copy() result in a temporary variable, check for NULL, and
if NULL return a non-zero error (e.g. -1) instead of returning success;
alternatively call the existing flb_sds_cat_safe()/flb_sds_copy_safe() pattern.
Update the branches handling FLB_STD_METADATA_PROJECT_ID_URI,
FLB_STD_METADATA_ZONE_URI, FLB_STD_METADATA_INSTANCE_ID_URI (and the copy at
line ~91) to follow this pattern so allocation failures do not leave *payload
NULL.

Removed package write permissions from integration test workflow.

Signed-off-by: zkdlin211 <68074864+zkdlin211@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant