Reduce garbage generation in `prometheus_text_format` #196

the-mikedavis · 2025-11-05T23:17:46Z

This is a somewhat small set of patches to prometheus_text_format which aim to reduce garbage creation during registry formatting. Reducing garbage creation drives down the cost to the VM of scraping large registries - both in terms of peak memory allocation and also the work that the garbage collector must do.

With these changes I see reduction in allocation reported by tprof in a stress test of one of RabbitMQ's most expensive registries. In a test against single-instance RabbitMQ brokers on EC2 instances this saves a noticeable amount of peak memory and reduces CPU utilization significantly.

tprof testing instructions

Clone https://github.com/rabbitmq/rabbitmq-server
cd rabbitmq-server
make deps
make run-broker
In another terminal in the rabbitmq-server repo, sbin/rabbitmqctl import_definitions path/to/100k-classic-queues.json pointing to this definitions file.
In the shell from the make run-broker terminal, start tprof tracing for new processes: tprof:start(#{type => call_memory}), tprof:enable_trace(new), tprof:set_pattern('_', '_', '_').
In another terminal scrape the expensive endpoint: curl -v localhost:15692/metrics/per-object --output /dev/null
When that's done, collect and format the sample: tprof:format(tprof:inspect(tprof:collect())).

To test this change, Ctrlc twice out of make broker, cd deps/prometheus and check out this branch. Then rm -rf ebin in that directory, cd ../../ and repeat steps 4, 6, 7 and 8 again (skipping definitions import).

Registry collection tprof measurement before this change...

****** Process <0.301089.0>  --  100.00% of total *** 
FUNCTION                                                                                   CALLS      WORDS    PER CALL  [    %]
... removed everything less than 1% ...
prometheus_text_format:render_labels/1                                                   2308195    1944642        0.84  [ 1.01]
erlang:atom_to_binary/2                                                                   651584    2375647        3.65  [ 1.23]
prometheus_rabbitmq_core_metrics_collector:'-emit_queue_info/3-fun-0-'/3                  100000    2500000       25.00  [ 1.29]
prometheus_model_helpers:counter_metric/2                                                 301325    3615900       12.00  [ 1.87]
prometheus_text_format:'-render_labels/1-fun-0-'/2                                        321434    4178642       13.00  [ 2.16]
prometheus_rabbitmq_core_metrics_collector:'-collect_metrics/2-lc$^1/1-0-'/2             2300145    4400076        1.91  [ 2.28]
prometheus_model_helpers:'-metrics_from_tuples/2-lc$^0/1-0-'/2                           2308456    4616300        2.00  [ 2.39]
lists:'-filter/2-lc$^0/1-0-'/2                                                           2408461    4816304        2.00  [ 2.49]
erlang:integer_to_binary/1                                                               2206892    6620701        3.00  [ 3.43]
prometheus_rabbitmq_core_metrics_collector:label/1                                       2200038   11000022        5.00  [ 5.69]
prometheus_rabbitmq_core_metrics_collector:'-collect_metrics/2-lc$^0/1-1-'/2             2300145   11500190        5.00  [ 5.95]
prometheus_text_format:'-emit_mf_metrics/2-fun-0-'/3                                     2308150   11541419        5.00  [ 5.97]
prometheus_model_helpers:gauge_metric/2                                                  2006812   24081744       12.00  [12.47]
prometheus_text_format:has_special_char/1                                               23475329   24147190        1.03  [12.50]
prometheus_text_format:render_series/3                                                   2308200   32511401       14.09  [16.83]
ets:match_object/2                                                                            19   38406095  2021373.42  [19.88]
                                                                                                  193184463              [100.0]

Registry collection tprof measurement after this change...

****** Process <0.401000.0>  --  99.99% of total *** 
FUNCTION                                                                                  CALLS      WORDS    PER CALL  [    %]
... removed everything less than 1% ...
prometheus_model_helpers:label_pair/1                                                    429393    1717572        4.00  [ 1.16]
prometheus_text_format:render_labels/1                                                  2308195    1944642        0.84  [ 1.32]
erlang:atom_to_binary/2                                                                  651584    2375647        3.65  [ 1.61]
prometheus_rabbitmq_core_metrics_collector:'-emit_queue_info/3-fun-0-'/3                 100000    2500000       25.00  [ 1.69]
prometheus_model_helpers:counter_metric/2                                                301325    3615900       12.00  [ 2.45]
prometheus_text_format:'-render_labels/1-fun-0-'/2                                       321434    4178642       13.00  [ 2.83]
prometheus_rabbitmq_core_metrics_collector:'-collect_metrics/2-lc$^1/1-0-'/2            2300145    4400076        1.91  [ 2.98]
prometheus_model_helpers:'-metrics_from_tuples/2-lc$^0/1-0-'/2                          2308456    4616300        2.00  [ 3.13]
lists:'-filter/2-lc$^0/1-0-'/2                                                          2408461    4816304        2.00  [ 3.26]
erlang:integer_to_binary/1                                                              2206892    6620705        3.00  [ 4.49]
prometheus_rabbitmq_core_metrics_collector:label/1                                      2200038   11000022        5.00  [ 7.45]
prometheus_rabbitmq_core_metrics_collector:'-collect_metrics/2-lc$^0/1-1-'/2            2300145   11500190        5.00  [ 7.79]
prometheus_text_format:render_series/4                                                  2308200   11541000        5.00  [ 7.82]
prometheus_text_format:render_value/2                                                   2308200   11543618        5.00  [ 7.82]
prometheus_model_helpers:gauge_metric/2                                                 2006812   24081744       12.00  [16.32]
ets:match_object/2                                                                           19   38406095  2021373.42  [26.02]
                                                                                                 147597866              [100.0]

So with this change, the Cowboy request process in charge of this endpoint allocates 147_597_866 words instead of 193_184_463, a reduction of 45_586_597 words or 23.6%.

Stress-testing on EC2...

On EC2 I have two m7g.xlarge instances running RabbitMQ: galactica which carries this change and kestrel which uses prometheus at v5.1.1 (latest version RabbitMQ has adopted). A third instance curls these instances at an interval of two seconds with this script:

#! /usr/bin/env bash

N=600
SLEEP=2
for i in $(seq 1 $N)
do
  echo "Sleeping ${SLEEP}s... ($i / $N)"
  sleep $SLEEP
  echo "Ask for metrics from $1... ($i / $N)"
  curl -s "http://$1:15692/metrics/per-object" --output /dev/null &
done

wait

This asynchronously fires off a scrape request every two seconds for twenty minutes. The third node runs this script against both galactica and kestrel at the same time. The third node also scrapes these nodes' node_exporter metrics and RabbitMQ prometheus endpoint for Erlang allocator metrics.

`kestrel` (baseline)

Instance-wide memory usage

Instance-wide CPU usage

Erlang allocators

`galactica` (this branch)

Instance-wide memory usage

Instance-wide CPU usage

Erlang allocators

We can see kestrel (baseline) pinned at around 95% CPU usage consistently, hovering at around 9-10 GB instance-wide memory usage and the VM aware of 3.5-4.5 GB of usage. And galactica (this branch) sitting at 50% CPU usage, around 7.5-8.5 GB instance-wide memory and the VM tracking around 2-3 GB of memory.

While the peak memory usage is reduced nicely, the main benefit is the CPU is loaded much less than before - I assume from performing less garbage collection.

NelsonVides · 2025-11-13T08:00:49Z

This looks amazing! I see it is still marked as draft so I guess no rush, and I'm away from the computer for a few days so only having a look at this on my phone now. Nevertheless, I'd love to see this ready and merged, thank you so much for the work 😃

the-mikedavis · 2025-11-14T03:35:41Z

Yep no real rush on this! Looks like I have some work to do to make the CI happy anyways

lhoguin · 2025-11-14T10:48:01Z

I've approved the workflow run.

codecov · 2025-11-14T10:49:37Z

Codecov Report

❌ Patch coverage is 97.72727% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/formats/prometheus_text_format.erl	97.56%	1 Missing ⚠️

Files with missing lines	Coverage Δ
src/prometheus_sup.erl	`79.41% <100.00%> (+1.99%)`	⬆️
src/formats/prometheus_text_format.erl	`94.52% <97.56%> (-1.69%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

NelsonVides

I'm still away from a proper computer so still quickly looking at this from my phone. Thanks a lot for having it pass CI!

I have a question, for good-looking code, the process dictionary's a bit ugly. Is there a way to refactor those usages to pass something more functional or does it have a performance impact that way?

Also for the compiled regex, perhaps storing it in a persistent term that is created at startup could help that perform even faster? It would literally be compiled only once through the entire VM lifetime and never require any GC.

If you don't see an easy way to improve that then it's probably fine to merge this way and later when I'm back to a full computer I'll try to refactor and tag you in a potential PR 🤔

the-mikedavis · 2025-11-19T22:18:50Z

Yeah the process dictionary part here is really icky. I thought in my earlier testing that I saw persistent_term:get/2 allocating but, trying again, looks like I was wrong. So the binary:match/2 pattern we should definitely move into a persistent_term.

For the erase/1+put/2 dance in format_into/3 we can't be more functional because the Collector:collect_mf/2 callback returns ok, so we don't have a nice way to accumulate values. To fix that we'd need a really big breaking change :/

NelsonVides

There's so many checks against whether something is a binary and if it is not then iolist_to_binary, that I really wish everything had been just binaries to begin with. But that's a breaking change for another day 😄

Anyway, I have a couple of comments regarding a bit more crazy performance optimisations and making tracing more readable. It's not really important and if it is too annoying I could just shuffle this code myself in a couple of weeks. But maybe you like the idea or you didn't know those tricks and want to do it yourself so just sharing. WDYT? 🙂

src/formats/prometheus_text_format.erl

`prometheus_text_format:has_special_char/1` is called very often when a registry contains many metrics with label pairs. We can use `binary:match/2` to search within a label binary for the special characters (newline, backslash and double-quote) without allocation. The old code using binary match syntax creates a match context every time the function is called (except, not recursion - then the match context is reused). A match context allocates 5 words to the process heap when it is created. When matching many many binaries this scales to create a noticeable amount of short-lived garbage. In comparison `binary:match/2` with a precompiled match pattern does not allocate. The BIF for it is also very well optimized, using `memchr` since OTP 22.

the-mikedavis · 2025-11-20T16:01:41Z

All of that sounds good to me - I applied all of the suggestions. For the render_metrics/3 one I tweaked it slightly so that the binary is always the first arg

NelsonVides

Loving it. Got one more question :D

src/formats/prometheus_text_format.erl

NelsonVides

Just dirty stuff, this was there before your code changes anyway, it just now shows on the diff.

src/formats/prometheus_text_format.erl

The formatting callback for a registry can build each metrics family as a single binary in order to reduce garbage. This mainly involves passing the accumulator binary through all functions that append to it. It's more efficient to append to the resulting binary than to allocate smaller binaries and then append them. For example: <<Blob/binary, Name/binary, "_", Suffix/binary>>. %% versus Combined = <<Name/binary, "_", Suffix/binary>>, <<Blob/binary, Combined/binary>>. The first expression generates less garbage than the second. A good example of this was the `add_brackets/1` function which was inlined. Inlining does not turn the first expression (above) into the second according to the compiler unfortunately, so we pay the cost of creating a binary with brackets and then formatting that into the larger blob, rather than formatting in just by copying. This change manually inlines `add_brackets/1` into its caller `render_series/4`. This change also changes some list strings into binaries. Especially for ASCII, strings binaries are _far_ more compact than lists. Lists need two words per ASCII character - one for the character and one for the tail pointer. So it's like UTF-32 but worse, basically UTF-128 on a 64 bit machine. ASCII or UTF-8 text in binaries takes one byte per character in the binary's array, plus a word or two of metadata. E.g. `<<"hello">>` allocates three words while `"hello"` allocates ten.

Building on the work in the parent commit, now that the data being passed to the `ram_file` is a binary, we can instead build the entire output gradually within the process. We pay in terms of I/O overhead from writing and then reading from the `ram_file` since `ram_file` is a port - all data is passed between the VM and the port driver. The memory consumed by a port driver is also invisible to the VM's allocator, so large port driver resource usage should be avoided where possible. Instead this change refactors the `registry_collect_callback` to fold over collectors and build an accumulator. The `create_mf` callback's return of `ok` forces us to store this rather than pass and return it. So it's a little less hygienic but is more efficient than passing data in/out of a port. This also introduces a function `format_into/3` which can use this folding function directly. This can be used to avoid collecting the entire response in one binary. Instead the response can be streamed with `cowboy_req:stream_body/3` for example.

NelsonVides

A blast of a change! Thank you so much for all the effort and also, for keeping such a tidy git history :)

NelsonVides · 2025-11-21T20:49:54Z

@the-mikedavis gonna get a release to hex done over the weekend 👍🏽

the-mikedavis · 2025-11-21T21:00:56Z

Sweet, thanks @NelsonVides!

NelsonVides · 2025-11-22T07:33:46Z

Published https://hex.pm/packages/prometheus/6.1.0 🎉

the-mikedavis · 2025-11-22T19:53:15Z

src/formats/prometheus_text_format.erl

+            "\n"
+        >>,
+        Bin = render_metrics(Prologue, Name, Metrics),
+        put(?MODULE, Fmt(Bin, erase(?MODULE)))


Bah, I made a mistake here with the order of the arguments. The function should take the state as the first argument and then the new data as the second argument. It doesn't end up making a difference for format/1 because it just changes the order that the metrics families are formatted in - it's just concatenating the wrong way. But using format_into/3 with a custom formatting function doesn't work properly. I'll send a follow-up PR (edit: #197)

the-mikedavis marked this pull request as draft November 6, 2025 18:17

the-mikedavis force-pushed the md/opt branch from f3b748d to ceb4c9f Compare November 6, 2025 19:22

This was referenced Nov 6, 2025

Add prometheus_text_format:format_into/3 #194

Closed

Optimization: stream HTTP responses from rabbit_prometheus_handler rabbitmq/rabbitmq-server#14885

Open

lhoguin requested a review from NelsonVides November 12, 2025 09:27

the-mikedavis force-pushed the md/opt branch from ceb4c9f to ce32cfe Compare November 14, 2025 03:34

the-mikedavis marked this pull request as ready for review November 14, 2025 03:36

NelsonVides reviewed Nov 19, 2025

View reviewed changes

the-mikedavis force-pushed the md/opt branch 2 times, most recently from 850a174 to f7dc0e1 Compare November 19, 2025 22:27

NelsonVides reviewed Nov 20, 2025

View reviewed changes

src/formats/prometheus_text_format.erl Outdated Show resolved Hide resolved

src/formats/prometheus_text_format.erl Outdated Show resolved Hide resolved

src/formats/prometheus_text_format.erl Outdated Show resolved Hide resolved

the-mikedavis force-pushed the md/opt branch from f7dc0e1 to 5662cb0 Compare November 20, 2025 15:59

NelsonVides reviewed Nov 20, 2025

View reviewed changes

src/formats/prometheus_text_format.erl Show resolved Hide resolved

NelsonVides reviewed Nov 20, 2025

View reviewed changes

src/formats/prometheus_text_format.erl Outdated Show resolved Hide resolved

NelsonVides reviewed Nov 20, 2025

View reviewed changes

src/formats/prometheus_text_format.erl Show resolved Hide resolved

the-mikedavis force-pushed the md/opt branch 2 times, most recently from 06af086 to 715663d Compare November 21, 2025 16:33

the-mikedavis added 2 commits November 21, 2025 11:35

the-mikedavis force-pushed the md/opt branch from 7102c10 to 665ce75 Compare November 21, 2025 16:35

NelsonVides approved these changes Nov 21, 2025

View reviewed changes

NelsonVides merged commit 566e985 into prometheus-erl:master Nov 21, 2025
5 checks passed

the-mikedavis deleted the md/opt branch November 21, 2025 21:00

the-mikedavis commented Nov 22, 2025

View reviewed changes

the-mikedavis mentioned this pull request Nov 22, 2025

prometheus_text_format: Fix argument order for format_into/3 fun #197

Merged

Reduce garbage generation in prometheus_text_format #196

Reduce garbage generation in prometheus_text_format #196

Conversation

the-mikedavis commented Nov 5, 2025

kestrel (baseline)

Instance-wide memory usage

Instance-wide CPU usage

Erlang allocators

galactica (this branch)

Instance-wide memory usage

Instance-wide CPU usage

Erlang allocators

Uh oh!

NelsonVides commented Nov 13, 2025

Uh oh!

the-mikedavis commented Nov 14, 2025

Uh oh!

lhoguin commented Nov 14, 2025

Uh oh!

codecov bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

NelsonVides left a comment

Choose a reason for hiding this comment

Uh oh!

the-mikedavis commented Nov 19, 2025

Uh oh!

NelsonVides left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

the-mikedavis commented Nov 20, 2025

Uh oh!

NelsonVides left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

NelsonVides left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NelsonVides left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NelsonVides commented Nov 21, 2025

Uh oh!

the-mikedavis commented Nov 21, 2025

Uh oh!

NelsonVides commented Nov 22, 2025

Uh oh!

the-mikedavis Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reduce garbage generation in `prometheus_text_format` #196

Reduce garbage generation in `prometheus_text_format` #196

`kestrel` (baseline)

`galactica` (this branch)

codecov bot commented Nov 14, 2025 •

edited

Loading

the-mikedavis Nov 22, 2025 •

edited

Loading