Skip to content

Conversation

@WaterWhisperer
Copy link

Summary

Previously, these two equivalent log formats were treated as separate rate limit groups due to different callsite identifiers. Now, rate limiting is based on message content and contextual fields (like component_id) rather than callsite.

Vector configuration

N/A - This is an internal library change to the tracing-limit crate. No Vector configuration is required for testing.

How did you test this PR?

  • Added a new test message_field_explicit_vs_implicit_same_bucket that verifies both info!(message = "Hello") and info!("Hello") are grouped under the same rate limit bucket
  • All 9 existing tests in the tracing-limit crate pass
  • Ran cargo test -p tracing-limit to verify no regressions
  • Ran cargo clippy -p tracing-limit with no warnings

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

@WaterWhisperer WaterWhisperer requested a review from a team as a code owner October 26, 2025 09:53
@thomasqueirozb thomasqueirozb added the no-changelog Changes in this PR do not need user-facing explanations in the release changelog label Oct 27, 2025
@WaterWhisperer
Copy link
Author

Hi @thomasqueirozb ,

I noticed that this PR needs some CI checks to be approved. .

Just wanted to check is there anything needs to be modified? I'm happy to rebase on the current main branch if needed

Thanks for your time and feedback!

@pront
Copy link
Member

pront commented Nov 18, 2025

Hi @thomasqueirozb ,

I noticed that this PR needs some CI checks to be approved. .

Just wanted to check is there anything needs to be modified? I'm happy to rebase on the current main branch if needed

Thanks for your time and feedback!

Hi @WaterWhisperer, please resolve any merge conflicts. We recently made changes to this library.

Copy link
Contributor

@thomasqueirozb thomasqueirozb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

I looked at this PR before and thought it needed some modifications but looking closely into it, it seems that everything is right. I'm just holding off on an approval because some further discussion is needed but everything looks good

Comment on lines +494 to +499
match field.name() {
COMPONENT_ID_FIELD => self.component_id = Some(value),
MESSAGE_FIELD => self.message = Some(value),
_ => {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that field.name() matches MESSAGE_FIELD both when info!("a") and info!(message="a"). Is this correct? I looked into tracing's code and it looks like info!("a") ends up being the same as info!(message="a") after a bunch of macro magic happens but I haven't verified this by testing it myself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have the same idea, that info!("a") and info!(message="a") are identical meaning that they can be using interchangeably.

Comment on lines 488 to +491
component_id: Option<TraceValue>,
message: Option<TraceValue>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we hash this to decrease memory usage? I think we'd run into issues of memory usage vs efficiency.

I'm worried about memory usage. With this change we'd now store the message in RateLimitedSpanKeys. Currently only component_id is stored and that isn't used in many places. Not sure if this is going to significantly impact us or not.

cc @pront

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valid concern. Is there any upper bound on how many of these we have to store at a give point in time? Are ever they removed?

Regarding hashing, that introduces new complexity and it's slower. So I would like us to understand this area better and also do some benchmarking too.

Copy link
Author

@WaterWhisperer WaterWhisperer Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

I've just rebased on the latest master to resolve the merge conflicts.

Regarding the memory usage concern: I understand that storing the full message string might increase memory usage. Since I'm relatively new to this codebase, I opted for the most straightforward solution first. If you think hashing the message or another optimization is necessary right now, I'd be happy to try implementing it with some guidance. Otherwise, I'm open to benchmarking if you have a preferred way to do that.

@pront
Copy link
Member

pront commented Nov 18, 2025

Also @WaterWhisperer, does this affect your pipelines in production? Trying to understand a bit better the motivation behind this solution.

@WaterWhisperer
Copy link
Author

Also @WaterWhisperer, does this affect your pipelines in production? Trying to understand a bit better the motivation behind this solution.

@pront, thanks for asking!

To be honest, I'm not running this in a production environment yet. I'm a Rust enthusiast and a new contributor looking to improve my skills by solving issues in open-source projects. I found this issue interesting because the behavior of info!("msg") and info!(message="msg") being treated differently felt inconsistent.

I hope this fix helps make vector's internal logging more predictable!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-changelog Changes in this PR do not need user-facing explanations in the release changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(tracing_limit): info!(message = "foo") and info!("foo") are not grouped under the same bucket

3 participants