Skip to content

Conversation

@aliehsaeedii
Copy link
Contributor

@aliehsaeedii aliehsaeedii commented Dec 15, 2025

This PR implements KAFKA-18615 by adding a windowSum aggregation that
computes the sum of values over a time window, so that commit-ratio,
poll-ratio, process-ratio, and punctuate-ratio represent the ratio
of the {action} over a window duration rather than a single iteration.

The effective window duration is whatever you configure for metrics:

  • metrics.sample.window.ms (per-sample window length)
  • times metrics.num.samples (number of rolling windows)

With the default Kafka metrics config, that is typically:
metrics.sample.window.ms = 30000 ms metrics.num.samples = 2 → ~60
seconds total rolling window.

Reviewers: Matthias J. Sax matthias@confluent.io, Bill Bejeck
bill@confluent.io, Vincent Potuček (@Pankraz76)

@github-actions github-actions bot added triage PRs from the community streams small Small PRs labels Dec 15, 2025
@github-actions github-actions bot removed the small Small PRs label Dec 16, 2025
Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also update the description of each metric, saying it's windowed (eg in ThreadMetrics.java)

@github-actions github-actions bot removed the triage PRs from the community label Dec 16, 2025
@github-actions github-actions bot added the small Small PRs label Dec 16, 2025
Copy link
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aliehsaeedii - overall LGTM - can we add a description to docs/upgrade.html section for 4.2?

@github-actions github-actions bot removed the small Small PRs label Dec 17, 2025
Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Few more minor things.

Comment on lines +2158 to +2160
final double latencyWindow =
windowedSum.measure(metricsConfig, now);
ratioSensor.record(latencyWindow / runOnceLatencyWindow);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final double latencyWindow =
windowedSum.measure(metricsConfig, now);
ratioSensor.record(latencyWindow / runOnceLatencyWindow);
ratioSensor.record(windowedSum.measure(metricsConfig, now) / runOnceLatencyWindow);

Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aliehsaeedii LGTM, with one minor comment

More details can be found in <a href="https://cwiki.apache.org/confluence/x/jQobFw">KIP-1221</a>.
</p>

<p>The streams thread metrics <code>commit-ratio</code>, <code>process-ratio</code>, <code>punctuate-ratio</code>, and <code>poll-ratio</code> have been updated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix wont' get into AK 4.2, but only AK 4.3. We are passed code-freeze and only blockers can get merged now.

We need to start a new section for 4.3 and move this change there.

@mjsax mjsax merged commit 5316f43 into apache:trunk Dec 20, 2025
20 checks passed
@mjsax
Copy link
Member

mjsax commented Dec 20, 2025

Thanks for the fix. Merged to trunk.

@Pankraz76
Copy link

Pankraz76 commented Dec 20, 2025

Thanks for the fix. Merged to trunk.

Just for my information: how are we planning to merge this, given the mandatory code freeze around this time of year?

This is only for development and will not be released anytime soon, correct?

However, if a hotfix were required, it would no longer be possible due to the merged feature.

I may have understood the code freeze policy differently, so I would appreciate some clarification.

Thanks.

@mjsax
Copy link
Member

mjsax commented Dec 20, 2025

Code freeze only applies to 4.2 branch, which is the release branch fro 4.2.0. We can always merge to trunk, and this fix (and everything else that is merged to trunk but not cherry-picked to 4.2 branch) will only go into 4.3.0 release.

@Pankraz76
Copy link

Thanks for the nifty update.

So we could also consider other fixes, like:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants