feat(observability): add component-probes feature for bpftrace component-level CPU attribution#24860
feat(observability): add component-probes feature for bpftrace component-level CPU attribution#24860connoryy wants to merge 22 commits intovectordotdev:masterfrom
Conversation
…attribution Adds an opt-in Cargo feature that lets external bpftrace scripts attribute CPU time to individual Vector components. A 4 KiB shared-memory array (VECTOR_COMPONENT_LABELS) is indexed by tid % 4096. On span enter the component's group-id is written; on span exit it is cleared. Both operations are single Relaxed atomic byte stores (~2 ns, no syscall). A separate uprobe symbol (vector_register_component) fires once per component at startup so bpftrace can build the id → name mapping and resolve the array's runtime address. Disabled by default. Runtime code is gated to Linux.
…attribution Adds an opt-in Cargo feature that lets external bpftrace scripts attribute CPU time to individual Vector components. A 4 KiB shared-memory array (VECTOR_COMPONENT_LABELS) is indexed by tid % 4096. On span enter the component's group-id is written; on span exit it is cleared. Both operations are single Relaxed atomic byte stores (~2 ns, no syscall). A separate uprobe symbol (vector_register_component) fires once per component at startup so bpftrace can build the id → name mapping and resolve the array's runtime address. Disabled by default. Runtime code is gated to Linux.
…ing to be maintained in bpftrace
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
thomasqueirozb
left a comment
There was a problem hiding this comment.
Thanks for this! This looks like a nice addition. Can you add a guide to website/content/en/guides/advanced/? This would make it possible for us to review and test this more easily and also have this documented to users that want to use this feature
|
|
||
| Replace `/path/to/vector` with your binary path: | ||
|
|
||
| ```bpf |
Check failure
Code scanning / check-spelling
Unrecognized Spelling
| if ($addr != 0) { | ||
| $group_id = *(uint32 *)$addr; | ||
| if ($group_id != 0) { | ||
| @stacks[@names[$group_id], ustack()] = count(); |
Check failure
Code scanning / check-spelling
Unrecognized Spelling
| ``` | ||
|
|
||
| This aggregates component-labeled stack traces directly in bpftrace. Start | ||
| bpftrace before Vector so it catches the registration uprobes during startup. |
Check failure
Code scanning / check-spelling
Unrecognized Spelling
| This aggregates component-labeled stack traces directly in bpftrace. Start | ||
| bpftrace before Vector so it catches the registration uprobes during startup. | ||
|
|
||
| If `ustack()` is not available in your environment, replace the `@stacks` |
Check failure
Code scanning / check-spelling
Unrecognized Spelling
| line with a `printf` to emit raw labeled samples that can be joined with | ||
| stack traces from other tools like `perf`: | ||
|
|
||
| ```bpf |
Check failure
Code scanning / check-spelling
Unrecognized Spelling
|
Hi @thomasqueirozb, ready for re-review! |
|
@thomasqueirozb ping, happy to make any changes as well |
|
Hi @connoryy, everything looks good here at a glance. I just haven't had to the time to try this for myself. I'll try to get this reviewed sometime next week. Thanks for your work on this! |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ea22be18d4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let label: &'static AtomicU32 = Box::leak(Box::new(AtomicU32::new(0))); | ||
| #[cfg(target_os = "linux")] | ||
| { | ||
| let tid = nix::unistd::gettid().as_raw() as u64; |
There was a problem hiding this comment.
Enable nix process feature before calling gettid
When component-probes is enabled on Linux, this nix::unistd::gettid() call will fail to compile because the repository’s nix dependency is configured without the process feature (which gates gettid in nix 0.31). That makes the newly added feature effectively unusable for downstream builds (cannot find function gettid in module nix::unistd) unless we either enable nix/process or switch this lookup to another API.
Useful? React with 👍 / 👎.
|
Added process feature per codex review. Let me know if this is preferable vs. the prior implementation I had using unsafe libc. |
Summary
Adds an opt-in component-probes Cargo feature that enables external bpftrace scripts to attribute CPU samples to individual Vector components by ID.
Vector configuration
How did you test this PR?
Unit tests for the component_probes module (label stability, uniqueness across threads, store/clear cycle). Tested end-to-end locally with a bpftrace script that attaches to the two uprobes (vector_register_thread, vector_register_component), builds the tid -> address and group_id -> name maps, and samples at 997 Hz. I can also provide a reference .bt script if requested.
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Relevant issue: #24851
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details here.