prometheus_remote_write: Fix cutoff logic.#225
prometheus_remote_write: Fix cutoff logic.#225PromyLOPh wants to merge 1 commit intofluent:masterfrom
Conversation
A single stale metric can suppress encoding of all following non-stale metrics if the return code CMT_ENCODE_PROMETHEUS_REMOTE_WRITE_CUTOFF_ERROR is not treated as success. Signed-off-by: Lars-Dominik Braun <lars@6xq.net>
cosmo0920
left a comment
There was a problem hiding this comment.
Looks good to me. Good catch!
There was a problem hiding this comment.
I rethink this PR but it's not sufficient to handle whether cutoff-ed or not. Because the cutoff variable is only storing the last one happened cutoff.
Instead, we need to use flag to preserve type type of errors like as: fluent/fluent-bit#9236
|
How is this information (whether some values were not transmitted due to the cutoff) used by fluent-bit? As far as I see |
Currently, we don't use this error for reporting to fluent-bit plugins. This is because for code simplicity. And I once rethink this PR again, I realized that this should be enough for handling extra cutting off circumstances. |
We noticed that fluent-bit’s prometheus remote_write output plugin was silently dropping some, but not all, process_exporter metrics after about one hour while the stdout output plugin was still showing metrics being collected. We were also able to reduce the time after which metrics were being dropped by modifying
CMT_ENCODE_PROMETHEUS_REMOTE_WRITE_CUTOFF_THRESHOLD, which indicates the problem is the cutoff logic. This merge-request treatsCMT_ENCODE_PROMETHEUS_REMOTE_WRITE_CUTOFF_ERRORas success and continues encoding other metrics, so they do not get dropped. It might be worth dropping this “error” code entirely, since it’s not really an error and leads to subtle bugs like this one.After merging this fix the bundled copy of cmetrics inside fluent-bit should be updated.