Decrease retained object allocations in the reporter#253
Draft
camallen wants to merge 1 commit intobuildkite:mainfrom
Draft
Decrease retained object allocations in the reporter#253camallen wants to merge 1 commit intobuildkite:mainfrom
camallen wants to merge 1 commit intobuildkite:mainfrom
Conversation
…n object allocations
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Save on maintained object allocations in the reporter, the changes in this PR are based on the below.
Buildkite TestCollector Memory Analysis
Gem Details
How the Collector Works (Lifecycle per Test)
before_setup— Creates aTracerobject, stores it inThread.current[:_buildkite_tracer]sleep()call, and WebMock stub createsSpanobjects appended to the tracer's treeafter_teardown— Finalizes the tracer, converts it to aTraceobject, stores it inUploader.traces[source_location]batch_size(default 500), spawns a background thread to upload, then deletes traces from the hash (TE-5203 fork fix)Memory Leak Vectors
1. CRITICAL:
Minitest::StatisticsReporter#recordretains failed/skipped test results foreverminitest.rb:889:The Buildkite reporter extends
Minitest::StatisticsReporterand callssuperinrecord. Every failed or skipped test's full result object (including its entireMinitest::Testinstance) gets accumulated in the reporter'sresultsarray for the entire test run. The Buildkite reporter inherits this unnecessarily.2. CRITICAL:
Traceholds a reference to the fullMinitest::Testinstanceminitest_plugin/trace.rb:20-25:@exampleis the entire test object — including all instance variables set during the test (fixtures, stubs, mock objects, etc.). This trace lives inUploader.tracesuntil it's batch-uploaded and deleted. With a batch size of 500, up to 500 full test objects are retained at any time.3. HIGH: SQL query strings accumulated as span data
test_collector.rb:109-111:Every single SQL query executed during a test gets its full query string stored as a
Spanchild node. For a Rails app with fixtures and complex setup, a single test easily executes 50-200+ SQL queries. Each span object stores: section, start_at, end_at, detail hash (with the query string), and a children array.With 550+ fixture files loaded, this is significant.
4. HIGH: HTTP request URLs stored in span detail hashes
network.rb:12:Every HTTP request (including WebMock-stubbed ones) creates a span with the full URL stored.
5. MEDIUM: Upload threads accumulate in
@upload_threadsarraysession.rb:53:Completed Thread objects are never removed from this array. They're only killed at
close(). With defaultbatch_sizeof 500 and thousands of tests, this array grows with dead Thread references.6. MEDIUM: TE-5203 fix has a timing gap
The fork's fix deletes traces from
Uploader.tracesafter callingupload_data. But between storage and batch upload, up tobatch_size(500) traces live in memory. Each trace holds@example(the full test instance) and@history(the entire span tree with all SQL queries).The Compounding Effect
Why enabling the collector causes OOM while without it there's only gradual increase:
Without collector:
StatisticsReporteraccumulates failed/skipped results (gradual increase)With collector:
Tracerwith a span tree capturing every SQL query, HTTP call, and sleepTraceobject holds a strong reference to the entireMinitest::Testinstance (preventing GC of the test and everything it references)historyhash is a deep recursive structure that gets serialized to JSON (creating temporary copies)Estimated Memory Per Test
For this app:
Rough estimate: 100-250KB per test retained until batch flush.
At batch_size=500: 50-125MB peak before flush.
This is on top of the base memory. Tests that set up large objects, fixtures, or generate many SQL queries will be much larger.
Recommendations
Quick Wins (env vars / config changes)
1. Reduce batch size
Set
BUILDKITE_ANALYTICS_UPLOAD_BATCH_SIZE=50to flush more frequently and reduce peak memory.2. Filter short spans
Set
BUILDKITE_ANALYTICS_TRACE_MIN_MS=5to filter out spans under 5ms (eliminates most SQL spans from the trace tree).3. Disable tracing entirely
If you only need test timing data (not per-query traces), configure with
tracing_enabled: false:This skips all monkey-patching (Net::HTTP, Object#sleep, ActiveRecord subscriber) and no spans are created.
Gem Patches (require forking further)
4. Break the Trace -> Test reference
The biggest win would be patching the
Traceto not hold@example. It only needs it forresult_code,source_location,class.name,name,failures, andfailure.message. These could be extracted eagerly ininitializeinstead of holding the full test object. This would allow GC to collect test instances immediately.5. Clear StatisticsReporter results
Override
recordin the Buildkite reporter to not callsuper, or clearresultsperiodically. This prevents minitest's own accumulation of failed/skipped test result objects.