out_stackdriver: fix multiple memory leaks and potential corruption#11667
out_stackdriver: fix multiple memory leaks and potential corruption#11667zkdlin211 wants to merge 4 commits intofluent:masterfrom
Conversation
Signed-off-by: zkdlin211 <68074864+zkdlin211@users.noreply.github.com>
📝 WalkthroughWalkthroughThis pull request addresses memory management and cleanup issues in the Stackdriver plugin while also updating workflow permissions. Changes include adding missing memory cleanup in error paths, fixing a potential leak in label packing, refactoring an internal helper function signature to improve parameter handling, and preventing reallocation of unfreed credentials. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6440aedab3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (c->resp.status == 200) { | ||
| ret_code = 0; | ||
| flb_sds_copy(payload, c->resp.payload, c->resp.payload_size); | ||
| *payload = flb_sds_copy(*payload, c->resp.payload, c->resp.payload_size); |
There was a problem hiding this comment.
Propagate SDS copy allocation failures
After switching fetch_metadata to update the caller's SDS pointer, this assignment can set *payload to NULL when flb_sds_copy fails (e.g., realloc under memory pressure), but the function still leaves ret_code as success. Callers then continue on the success path and dereference payload (flb_sds_len, parsing, destroy), which can crash and also loses the original SDS pointer. Treat a NULL return from flb_sds_copy as an error and return failure from fetch_metadata.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/out_stackdriver/gce_metadata.c (1)
195-203:⚠️ Potential issue | 🟠 MajorFree the old
ctx->project_idbefore replacing it from metadata.This still overwrites
ctx->project_idat Line 202. Cross-file,plugins/out_stackdriver/stackdriver_conf.cLines 136-140 can already populate it from the credentials file, andplugins/out_stackdriver/stackdriver.cLines 1347-1351 callgce_metadata_read_project_id()later whenmetadata_server_authis enabled, so the credential-derived SDS still leaks on that mixed-auth path.♻️ Suggested fix
- ctx->project_id = flb_sds_create(payload); + { + flb_sds_t new_project_id; + + new_project_id = flb_sds_create(payload); + if (!new_project_id) { + flb_sds_destroy(payload); + return -1; + } + + if (ctx->project_id) { + flb_sds_destroy(ctx->project_id); + } + + ctx->project_id = new_project_id; + } flb_sds_destroy(payload); return 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/out_stackdriver/gce_metadata.c` around lines 195 - 203, The code that replaces ctx->project_id after calling fetch_metadata (in gce_metadata_read_project_id / the shown block) leaks the previous SDS; before assigning ctx->project_id = flb_sds_create(payload) you must free any existing ctx->project_id (check for non-NULL and call flb_sds_destroy(ctx->project_id)). Keep the fetch_metadata error path unchanged, and only destroy the old ctx->project_id immediately prior to creating/assigning the new SDS so the credential-populated SDS from stackdriver_conf.c doesn't leak when metadata_server_auth triggers this code.
🧹 Nitpick comments (1)
.github/workflows/master-integration-test.yaml (1)
30-33: Changepackages: writetopackages: readin the integration test job.The called workflow
call-run-integration-test.yamlonly pulls images viadocker pulland loads them locally withkind load docker-image. It performs no package publishing, docker pushes, or artifact uploads. Usepackages: readto follow least-privilege principle.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/master-integration-test.yaml around lines 30 - 33, The workflow currently grants excessive permissions under the permissions block by setting packages: write; change that to packages: read to follow least-privilege principles. Edit the permissions stanza (the permissions: contents: and packages: entries) in the job that calls the reusable workflow (uses: ./.github/workflows/call-run-integration-test.yaml) so it only requests packages: read instead of packages: write.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/out_stackdriver/gce_metadata.c`:
- Around line 46-55: The test-mode branches that append/copy into *payload
(calls to flb_sds_cat() for FLB_STD_METADATA_PROJECT_ID_URI,
FLB_STD_METADATA_ZONE_URI, FLB_STD_METADATA_INSTANCE_ID_URI and the
flb_sds_copy() use around line 91) must validate the return value before
overwriting *payload: store the flb_sds_cat()/flb_sds_copy() result in a
temporary variable, check for NULL, and if NULL return a non-zero error (e.g.
-1) instead of returning success; alternatively call the existing
flb_sds_cat_safe()/flb_sds_copy_safe() pattern. Update the branches handling
FLB_STD_METADATA_PROJECT_ID_URI, FLB_STD_METADATA_ZONE_URI,
FLB_STD_METADATA_INSTANCE_ID_URI (and the copy at line ~91) to follow this
pattern so allocation failures do not leave *payload NULL.
---
Outside diff comments:
In `@plugins/out_stackdriver/gce_metadata.c`:
- Around line 195-203: The code that replaces ctx->project_id after calling
fetch_metadata (in gce_metadata_read_project_id / the shown block) leaks the
previous SDS; before assigning ctx->project_id = flb_sds_create(payload) you
must free any existing ctx->project_id (check for non-NULL and call
flb_sds_destroy(ctx->project_id)). Keep the fetch_metadata error path unchanged,
and only destroy the old ctx->project_id immediately prior to creating/assigning
the new SDS so the credential-populated SDS from stackdriver_conf.c doesn't leak
when metadata_server_auth triggers this code.
---
Nitpick comments:
In @.github/workflows/master-integration-test.yaml:
- Around line 30-33: The workflow currently grants excessive permissions under
the permissions block by setting packages: write; change that to packages: read
to follow least-privilege principles. Edit the permissions stanza (the
permissions: contents: and packages: entries) in the job that calls the reusable
workflow (uses: ./.github/workflows/call-run-integration-test.yaml) so it only
requests packages: read instead of packages: write.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cc1c39a4-f71a-4320-a98c-5076754537ea
📒 Files selected for processing (4)
.github/workflows/master-integration-test.yamlplugins/out_stackdriver/gce_metadata.cplugins/out_stackdriver/stackdriver.cplugins/out_stackdriver/stackdriver_conf.c
| if (strcmp(uri, FLB_STD_METADATA_PROJECT_ID_URI) == 0) { | ||
| flb_sds_cat(payload, "fluent-bit-test", 15); | ||
| *payload = flb_sds_cat(*payload, "fluent-bit-test", 15); | ||
| return 0; | ||
| } | ||
| else if (strcmp(uri, FLB_STD_METADATA_ZONE_URI) == 0) { | ||
| flb_sds_cat(payload, "projects/0123456789/zones/fluent", 32); | ||
| *payload = flb_sds_cat(*payload, "projects/0123456789/zones/fluent", 32); | ||
| return 0; | ||
| } | ||
| else if (strcmp(uri, FLB_STD_METADATA_INSTANCE_ID_URI) == 0) { | ||
| flb_sds_cat(payload, "333222111", 9); | ||
| *payload = flb_sds_cat(*payload, "333222111", 9); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sds_file="$(fd -i '^flb_sds\.c$' | head -n1)"
if [ -z "${sds_file}" ]; then
echo "flb_sds.c not found" >&2
exit 1
fi
echo "== flb_sds_cat / flb_sds_copy definitions =="
rg -n -C4 'flb_sds_(cat|copy)\s*\(' "$sds_file"
echo
echo "== NULL-return paths in flb_sds.c =="
rg -n -C2 'return NULL;' "$sds_file"
echo
echo "== Changed assignments in fetch_metadata =="
rg -n -C2 'flb_sds_(cat|copy)\s*\(\*payload' plugins/out_stackdriver/gce_metadata.c
echo
echo "== Downstream payload dereferences after fetch_metadata success =="
rg -n -C2 'flb_sds_len\(payload\)|flb_sds_create\(payload\)' plugins/out_stackdriver/gce_metadata.cRepository: fluent/fluent-bit
Length of output: 4311
Handle SDS growth failures before returning success.
Lines 47, 51, 55, and 91 assign the result of flb_sds_cat() or flb_sds_copy() directly to *payload without checking for allocation failures. Both functions return NULL on reallocation failure. If this occurs, *payload becomes NULL, but the function returns 0 (success). Downstream code then dereferences the NULL pointer via flb_sds_len(payload), flb_sds_create(payload), or loop conditions, causing a crash.
The codebase already has flb_sds_cat_safe() implementing the correct pattern: validate the result before reassigning to the caller's buffer and return an error code on failure.
💡 Suggested fix pattern (from flb_sds_cat_safe)
- if (c->resp.status == 200) {
- ret_code = 0;
- *payload = flb_sds_copy(*payload, c->resp.payload, c->resp.payload_size);
- }
+ if (c->resp.status == 200) {
+ flb_sds_t tmp;
+ tmp = flb_sds_copy(*payload, c->resp.payload, c->resp.payload_size);
+ if (!tmp) {
+ flb_errno();
+ ret_code = -1;
+ }
+ else {
+ *payload = tmp;
+ ret_code = 0;
+ }
+ }Apply the same validation to all three flb_sds_cat() calls in test mode (lines 47, 51, 55).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/out_stackdriver/gce_metadata.c` around lines 46 - 55, The test-mode
branches that append/copy into *payload (calls to flb_sds_cat() for
FLB_STD_METADATA_PROJECT_ID_URI, FLB_STD_METADATA_ZONE_URI,
FLB_STD_METADATA_INSTANCE_ID_URI and the flb_sds_copy() use around line 91) must
validate the return value before overwriting *payload: store the
flb_sds_cat()/flb_sds_copy() result in a temporary variable, check for NULL, and
if NULL return a non-zero error (e.g. -1) instead of returning success;
alternatively call the existing flb_sds_cat_safe()/flb_sds_copy_safe() pattern.
Update the branches handling FLB_STD_METADATA_PROJECT_ID_URI,
FLB_STD_METADATA_ZONE_URI, FLB_STD_METADATA_INSTANCE_ID_URI (and the copy at
line ~91) to follow this pattern so allocation failures do not leave *payload
NULL.
Removed package write permissions from integration test workflow. Signed-off-by: zkdlin211 <68074864+zkdlin211@users.noreply.github.com>
fetch_metadata. The function now accepts a pointer to the SDS pointer (flb_sds_t **payload) to ensure the caller's pointer is updated after potential reallocations, preventing use-after-free errors and memory leaks.destroy_http_requestin an error handling path withinstackdriver_formatto prevent leakinghttp_requestresources when encountering malformed label data.flb_ra_valueobject is always destroyed inpack_resource_labels, even if the fetched value is not of type string, fixing a memory leak.Testing
valgrind --leak-check=full --show-leak-kinds=definite --errors-for-leak-kinds=definite --error-exitcode=1 build/valgrind/bin/flb-rt-out_stackdriver --no-execFluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
Release Notes
Bug Fixes
Chores