prometheus_remote_write: Fix cutoff logic. by PromyLOPh · Pull Request #225 · fluent/cmetrics

PromyLOPh · 2024-10-01T06:43:05Z

We noticed that fluent-bit’s prometheus remote_write output plugin was silently dropping some, but not all, process_exporter metrics after about one hour while the stdout output plugin was still showing metrics being collected. We were also able to reduce the time after which metrics were being dropped by modifying CMT_ENCODE_PROMETHEUS_REMOTE_WRITE_CUTOFF_THRESHOLD, which indicates the problem is the cutoff logic. This merge-request treats CMT_ENCODE_PROMETHEUS_REMOTE_WRITE_CUTOFF_ERROR as success and continues encoding other metrics, so they do not get dropped. It might be worth dropping this “error” code entirely, since it’s not really an error and leads to subtle bugs like this one.

After merging this fix the bundled copy of cmetrics inside fluent-bit should be updated.

A single stale metric can suppress encoding of all following non-stale metrics if the return code CMT_ENCODE_PROMETHEUS_REMOTE_WRITE_CUTOFF_ERROR is not treated as success. Signed-off-by: Lars-Dominik Braun <lars@6xq.net>

cosmo0920

Looks good to me. Good catch!

cosmo0920

I rethink this PR but it's not sufficient to handle whether cutoff-ed or not. Because the cutoff variable is only storing the last one happened cutoff.
Instead, we need to use flag to preserve type type of errors like as: fluent/fluent-bit#9236

PromyLOPh · 2024-10-14T09:43:43Z

How is this information (whether some values were not transmitted due to the cutoff) used by fluent-bit? As far as I see cmt_encode_prometheus_remote_write_create always handled CMT_ENCODE_PROMETHEUS_REMOTE_WRITE_CUTOFF_ERROR like a success and never reported anything to the upper layers (like fluent-bit). Is this something that needs to be changed?

cosmo0920 · 2024-10-15T06:29:00Z

How is this information (whether some values were not transmitted due to the cutoff) used by fluent-bit? As far as I see cmt_encode_prometheus_remote_write_create always handled CMT_ENCODE_PROMETHEUS_REMOTE_WRITE_CUTOFF_ERROR like a success and never reported anything to the upper layers (like fluent-bit). Is this something that needs to be changed?

Currently, we don't use this error for reporting to fluent-bit plugins. This is because for code simplicity. And I once rethink this PR again, I realized that this should be enough for handling extra cutting off circumstances.

prometheus_remote_write: Fix cutoff logic.

e97c84b

A single stale metric can suppress encoding of all following non-stale metrics if the return code CMT_ENCODE_PROMETHEUS_REMOTE_WRITE_CUTOFF_ERROR is not treated as success. Signed-off-by: Lars-Dominik Braun <lars@6xq.net>

PromyLOPh force-pushed the master branch from 66ced89 to e97c84b Compare October 1, 2024 06:44

cosmo0920 approved these changes Oct 11, 2024

View reviewed changes

cosmo0920 requested changes Oct 11, 2024

View reviewed changes

cosmo0920 approved these changes Oct 15, 2024

View reviewed changes

edsiper self-requested a review as a code owner September 10, 2025 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prometheus_remote_write: Fix cutoff logic.#225

prometheus_remote_write: Fix cutoff logic.#225
PromyLOPh wants to merge 1 commit intofluent:masterfrom
PromyLOPh:master

PromyLOPh commented Oct 1, 2024 •

edited

Loading

Uh oh!

cosmo0920 left a comment

Uh oh!

cosmo0920 left a comment •

edited

Loading

Uh oh!

PromyLOPh commented Oct 14, 2024

Uh oh!

cosmo0920 commented Oct 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PromyLOPh commented Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cosmo0920 left a comment

Choose a reason for hiding this comment

Uh oh!

cosmo0920 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PromyLOPh commented Oct 14, 2024

Uh oh!

cosmo0920 commented Oct 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PromyLOPh commented Oct 1, 2024 •

edited

Loading

cosmo0920 left a comment •

edited

Loading