Skip to content

Decrease retained object allocations in the reporter#253

Draft
camallen wants to merge 1 commit intobuildkite:mainfrom
camallen:improve-memory-usage
Draft

Decrease retained object allocations in the reporter#253
camallen wants to merge 1 commit intobuildkite:mainfrom
camallen:improve-memory-usage

Conversation

@camallen
Copy link

Save on maintained object allocations in the reporter, the changes in this PR are based on the below.

Buildkite TestCollector Memory Analysis

Gem Details

  • Version: 2.11.0

How the Collector Works (Lifecycle per Test)

  1. before_setup — Creates a Tracer object, stores it in Thread.current[:_buildkite_tracer]
  2. During test — Every SQL query, HTTP request, sleep() call, and WebMock stub creates Span objects appended to the tracer's tree
  3. after_teardown — Finalizes the tracer, converts it to a Trace object, stores it in Uploader.traces[source_location]
  4. Reporter#record — Queues the trace for upload. When queue hits batch_size (default 500), spawns a background thread to upload, then deletes traces from the hash (TE-5203 fork fix)

Memory Leak Vectors

1. CRITICAL: Minitest::StatisticsReporter#record retains failed/skipped test results forever

minitest.rb:889:

results << result if not result.passed? or result.skipped?

The Buildkite reporter extends Minitest::StatisticsReporter and calls super in record. Every failed or skipped test's full result object (including its entire Minitest::Test instance) gets accumulated in the reporter's results array for the entire test run. The Buildkite reporter inherits this unnecessarily.

2. CRITICAL: Trace holds a reference to the full Minitest::Test instance

minitest_plugin/trace.rb:20-25:

def initialize(example, history:, tags: nil, trace: nil, location_prefix: nil)
  @example = example  # <-- Full Minitest::Test instance!
  ...
end

@example is the entire test object — including all instance variables set during the test (fixtures, stubs, mock objects, etc.). This trace lives in Uploader.traces until it's batch-uploaded and deleted. With a batch size of 500, up to 500 full test objects are retained at any time.

3. HIGH: SQL query strings accumulated as span data

test_collector.rb:109-111:

ActiveSupport::Notifications.subscribe("sql.active_record") do |name, start, finish, id, payload|
  Buildkite::TestCollector::Uploader.tracer&.backfill(:sql, finish - start, **{ query: payload[:sql] })
end

Every single SQL query executed during a test gets its full query string stored as a Span child node. For a Rails app with fixtures and complex setup, a single test easily executes 50-200+ SQL queries. Each span object stores: section, start_at, end_at, detail hash (with the query string), and a children array.

With 550+ fixture files loaded, this is significant.

4. HIGH: HTTP request URLs stored in span detail hashes

network.rb:12:

detail = { method: request.method.upcase, url: uri.to_s, lib: "net-http" }

Every HTTP request (including WebMock-stubbed ones) creates a span with the full URL stored.

5. MEDIUM: Upload threads accumulate in @upload_threads array

session.rb:53:

@upload_threads << new_thread if new_thread

Completed Thread objects are never removed from this array. They're only killed at close(). With default batch_size of 500 and thousands of tests, this array grows with dead Thread references.

6. MEDIUM: TE-5203 fix has a timing gap

The fork's fix deletes traces from Uploader.traces after calling upload_data. But between storage and batch upload, up to batch_size (500) traces live in memory. Each trace holds @example (the full test instance) and @history (the entire span tree with all SQL queries).

The Compounding Effect

Why enabling the collector causes OOM while without it there's only gradual increase:

Without collector:

  • Tests run, minitest's own StatisticsReporter accumulates failed/skipped results (gradual increase)
  • Ruby GC can collect test objects after each test

With collector:

  • All the above, PLUS:
  • Each test creates a Tracer with a span tree capturing every SQL query, HTTP call, and sleep
  • The Trace object holds a strong reference to the entire Minitest::Test instance (preventing GC of the test and everything it references)
  • Up to 500 of these accumulate before batch upload
  • SQL spans can be 50-200+ per test, each storing the full query string
  • Upload threads accumulate as dead references
  • The history hash is a deep recursive structure that gets serialized to JSON (creating temporary copies)

Estimated Memory Per Test

For this app:

  • Minitest::Test instance with fixtures: ~50-200KB (depending on accessed fixtures)
  • SQL span objects (100 queries avg): ~30-50KB of query strings + span overhead
  • HTTP span objects: ~5-10KB
  • Trace metadata: ~2-5KB

Rough estimate: 100-250KB per test retained until batch flush.
At batch_size=500: 50-125MB peak before flush.

This is on top of the base memory. Tests that set up large objects, fixtures, or generate many SQL queries will be much larger.

Recommendations

Quick Wins (env vars / config changes)

1. Reduce batch size

Set BUILDKITE_ANALYTICS_UPLOAD_BATCH_SIZE=50 to flush more frequently and reduce peak memory.

2. Filter short spans

Set BUILDKITE_ANALYTICS_TRACE_MIN_MS=5 to filter out spans under 5ms (eliminates most SQL spans from the trace tree).

3. Disable tracing entirely

If you only need test timing data (not per-query traces), configure with tracing_enabled: false:

Buildkite::TestCollector.configure(hook: :minitest, tracing_enabled: false)

This skips all monkey-patching (Net::HTTP, Object#sleep, ActiveRecord subscriber) and no spans are created.

Gem Patches (require forking further)

4. Break the Trace -> Test reference

The biggest win would be patching the Trace to not hold @example. It only needs it for result_code, source_location, class.name, name, failures, and failure.message. These could be extracted eagerly in initialize instead of holding the full test object. This would allow GC to collect test instances immediately.

5. Clear StatisticsReporter results

Override record in the Buildkite reporter to not call super, or clear results periodically. This prevents minitest's own accumulation of failed/skipped test result objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant