Add buffer pooling for reduced allocs, GC pressure, and improved performance #691

kainosnoema · 2025-10-27T20:32:52Z

We've seen some high memory and GC pressure in our production deployment. One culprit seems to be func (*Request) execSelections, which can create a lot of buffers that have to grow multiple times in large responses.

One important optimization we can make here is buffer pooling. This PR implements sync.Pool for bytes.Buffer ~~and map[string]*fieldToExec~~ to reduce allocations, memory growth, and GC pressure during execution.

Changes:

Add internal/exec/pool.go with buffer ~~and field map~~ pooling
Update Execute(), execSelections(), execList() to use pools
Apply pooling to subscription handling
Tests and benchmarks

Isolated Benchmarks

Summary:

ListQuery: 111µs → 78µs (29% faster), 124KB → 43KB (65% less memory)
ListWithNestedLists: 428µs → 346µs (19% faster), 637KB → 177KB (72% less memory)
Concurrent: 219µs → 95µs (56% faster), 659KB → 189KB (71% less memory)

# Without Pooling
BenchmarkMemory_SimpleQuery-12                    115488             10336 ns/op            9925 B/op        126 allocs/op
BenchmarkMemory_ListQuery-12                       10000            111123 ns/op          124602 B/op       2037 allocs/op
BenchmarkMemory_NestedQuery-12                     17557             61133 ns/op           80886 B/op       1074 allocs/op
BenchmarkMemory_ListWithNestedLists-12              2841            428631 ns/op          637248 B/op       9660 allocs/op
BenchmarkMemory_Concurrent-12                       5092            219808 ns/op          659525 B/op      10124 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12                123319              9433 ns/op            8691 B/op        106 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12                23680             42620 ns/op           80833 B/op       1016 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12          50025             24150 ns/op           37659 B/op        444 allocs/op

# With Pooling
BenchmarkMemory_SimpleQuery-12                    131162              9250 ns/op            7423 B/op        110 allocs/op
BenchmarkMemory_ListQuery-12                       16544             78654 ns/op           43204 B/op       1474 allocs/op
BenchmarkMemory_NestedQuery-12                     23664             59657 ns/op           36693 B/op        810 allocs/op
BenchmarkMemory_ListWithNestedLists-12              3644            346407 ns/op          177613 B/op       6696 allocs/op
BenchmarkMemory_Concurrent-12                      13980             95817 ns/op          189372 B/op       7055 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12                119922              9149 ns/op            6429 B/op         93 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12                39200             30693 ns/op           19197 B/op        655 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12          58494             20190 ns/op           12544 B/op        309 allocs/op

Real-world Benchmarks (private to our app, includes db i/o)

# Without Pooling
BenchmarkMeQuery-12         	    4756	    708655 ns/op	  104785 B/op	    1377 allocs/op
BenchmarkThreadsQuery-12    	     252	  14639531 ns/op	 5990799 B/op	  101922 allocs/op

# With Pooling
BenchmarkMeQuery-12         	    4786	    693537 ns/op	   91532 B/op	    1351 allocs/op
BenchmarkThreadsQuery-12    	     312	  10358961 ns/op	 5519296 B/op	   98256 allocs/op

Implements sync.Pool for bytes.Buffer and map[string]*fieldToExec to reduce allocations and memory growth during GraphQL execution. Key improvements: - Buffer pool: 53-87% faster, 50-100% fewer allocations - Field map pool: 53% faster, 83% less memory per operation - GC pressure significantly reduced Changes: - Add internal/exec/pool.go with buffer and field map pooling - Update Execute(), execSelections(), execList() to use pools - Apply pooling to subscription handling - Add comprehensive tests and benchmarks Signed-off-by: Evan Owen <kainosnoema@gmail.com> Co-authored-by: Amp <amp@ampcode.com>

pavelnikolov · 2025-10-28T00:10:31Z

internal/exec/pool.go

+	maxBufferCap    = 64 * 1024
+	maxFieldMapSize = 128
+	newFieldMapSize = 16


I wonder if these should be configurable through schema options, in case some projects have special requirements...

It's a good question—I'm not sure what level it should be configurable on, and it's possible that there's actually not much need for configuring, and might just cause more problems than it solves by exposing it.

In any case, rather than storing these globally, alternatively we could try to maintain buffers per-schema and manage them there, making it easier to configure at that level if we want. Thoughts?

I love the idea about buffer pool per schema. Then it can even be configured easier.

internal/exec/pool.go

internal/exec/pool_test.go

internal/exec/exec.go

internal/exec/pool.go

- Remove field map pooling, difficult to configure and minimal benefit - Move pool implementation to internal/exec/resolvable/pools.go - Pool configuration passed to ApplyResolver, ensuring pools always initialized - Add MaxPooledBufferCap schema option to control buffer pool behavior - Add documentation to README with usage guidance

This doesn't make a difference in practice, given only a few top level fields are mapped.

kainosnoema · 2025-11-12T22:50:28Z

Ok @pavelnikolov I've pushed the changes to move the pool to the parsed schema (not the AST), and also ended up removing field map pooling since it was trickier to manage that interface given the private internal/exec types there and didn't end up contributing much at all to the overall alloc reduction vs. the buffer pooling. Also reduced the default max buffer size from 64KB to 8KB since it seems to be diminishing returns for most usage above that.

Pools now managed per-schema in internal/exec/resolvable/schema.go
Configuration passed to newSchema(), ensuring pools are always initialized
Added MaxPooledBufferCap(n) schema option to control buffer pool behavior (default: 8KB)
Removed field map pooling (premature optimization—doesn't help much in practice)

# MaxPooledBufferCap = 0
BenchmarkMemory_SimpleQuery-12            	  121188	      9297 ns/op	    7814 B/op	     124 allocs/op
BenchmarkMemory_ListQuery-12              	   14977	     80722 ns/op	   76599 B/op	    1888 allocs/op
BenchmarkMemory_NestedQuery-12            	   21205	     57341 ns/op	   55271 B/op	    1013 allocs/op
BenchmarkMemory_ListWithNestedLists-12    	    3260	    372007 ns/op	  355102 B/op	    8762 allocs/op
BenchmarkMemory_Concurrent-12             	    6942	    166021 ns/op	  376719 B/op	    9223 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12         	  128905	      8828 ns/op	    6564 B/op	     103 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12        	   36316	     33391 ns/op	   33128 B/op	     866 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12  	   52009	     22965 ns/op	   17160 B/op	     383 allocs/op

# MaxPooledBufferCap = 8 * 1024 (default)
BenchmarkMemory_SimpleQuery-12            	  131232	      9113 ns/op	    7126 B/op	     113 allocs/op
BenchmarkMemory_ListQuery-12              	   17350	     68485 ns/op	   42860 B/op	    1477 allocs/op
BenchmarkMemory_NestedQuery-12            	   23960	     49852 ns/op	   31577 B/op	     814 allocs/op
BenchmarkMemory_ListWithNestedLists-12    	    3802	    309700 ns/op	  219748 B/op	    6714 allocs/op
BenchmarkMemory_Concurrent-12             	    7854	    129856 ns/op	  235649 B/op	    7082 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12         	  140816	      8456 ns/op	    6117 B/op	      95 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12        	   44174	     27355 ns/op	   19208 B/op	     657 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12  	   58017	     20831 ns/op	   12597 B/op	     313 allocs/op

# MaxPooledBufferCap = 16 * 1024
BenchmarkMemory_SimpleQuery-12            	  129972	      9046 ns/op	    7127 B/op	     113 allocs/op
BenchmarkMemory_ListQuery-12              	   17528	     68490 ns/op	   42857 B/op	    1477 allocs/op
BenchmarkMemory_NestedQuery-12            	   24189	     49430 ns/op	   31572 B/op	     814 allocs/op
BenchmarkMemory_ListWithNestedLists-12    	    3910	    306600 ns/op	  177092 B/op	    6701 allocs/op
BenchmarkMemory_Concurrent-12             	   16738	     76312 ns/op	  188818 B/op	    7058 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12         	  140565	      8480 ns/op	    6117 B/op	      95 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12        	   44102	     27169 ns/op	   19208 B/op	     657 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12  	   57885	     20922 ns/op	   12597 B/op	     313 allocs/op

# MaxPooledBufferCap = 32 * 1024
BenchmarkMemory_SimpleQuery-12            	  131935	      9262 ns/op	    7126 B/op	     113 allocs/op
BenchmarkMemory_ListQuery-12              	   17450	     68805 ns/op	   42849 B/op	    1477 allocs/op
BenchmarkMemory_NestedQuery-12            	   24218	     49551 ns/op	   31575 B/op	     814 allocs/op
BenchmarkMemory_ListWithNestedLists-12    	    3926	    307991 ns/op	  177105 B/op	    6701 allocs/op
BenchmarkMemory_Concurrent-12             	   16626	     71470 ns/op	  188937 B/op	    7058 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12         	  142057	      8567 ns/op	    6117 B/op	      95 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12        	   43944	     27141 ns/op	   19209 B/op	     657 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12  	   57326	     20924 ns/op	   12597 B/op	     313 allocs/op

After more benchmarking, it seems like 8KB is a good default for most use cases. Increasing to 16KB only incrementally improves performance in most common responses, but increases memory usage for outliers. Signed-off-by: Evan Owen <evan@gluegroups.com>

kainosnoema · 2025-11-12T23:20:41Z

@pavelnikolov after a bit more testing, seems like BenchmarkMemory_Concurrent-12 is consistently showing much better performance with MaxPooledBufferCap at 16KB vs 8KB—nearly 60% faster. It feels like a common case in production, but I can't decide where the default should be here.... let me know what you think.

Clearly the optimal value is one where you can afford the memory in order to keep buffers around that are big enough to generate the entire response without re-allocating regularly. My sense is that 16KB might actually be a better default, and can be raised for deployments that have large responses.

Update: I'm reminded that the GC will eventually release buffers in the pool that aren't in use so memory isn't retained permanently, but the issue is max-memory usage until the next GC when you have high variance usage. I found the discussion on this issue insightful—it's possible to automatically optimize the threshold for discarding large buffers using local statistics, but probably beyond the scope of this change. That said, given their choice of always re-using buffers >= 64KB in the example, I'm thinking we could safely choose a default value higher than 8KB, maybe even 32KB and remain in a reasonable zone.

kainosnoema and others added 3 commits October 24, 2025 18:17

fix(lint): QF1012 in Subscribe

45cdc80

Update CHANGELOG with memory pooling improvements

ec053b1

kainosnoema force-pushed the evan/buffer-pooling branch from 27f6cc1 to ec053b1 Compare October 27, 2025 22:25

pavelnikolov reviewed Nov 4, 2025

View reviewed changes

kainosnoema added 4 commits November 12, 2025 12:08

PR feedback

3ba6e6b

Reduce maxBufferCap for tests

512479d

Fix uninitialized buffer pool for introspection query

776ce5e

kainosnoema force-pushed the evan/buffer-pooling branch from 9e758aa to 776ce5e Compare November 12, 2025 22:31

Revert incorrect field map init value

1d51c41

This doesn't make a difference in practice, given only a few top level fields are mapped.

kainosnoema changed the title ~~Add memory pooling for buffers and field maps for improved memory usage and performance~~ Add memory pooling for buffers for improved memory usage and performance Nov 12, 2025

kainosnoema added 3 commits November 12, 2025 14:57

Fix readme

e3da7f9

Clarify units in MaxPooledBufferCap readme line

c651c21

kainosnoema changed the title ~~Add memory pooling for buffers for improved memory usage and performance~~ Add memory pooling for buffers for reduced allocs, GC pressure, and improved performance Nov 13, 2025

kainosnoema changed the title ~~Add memory pooling for buffers for reduced allocs, GC pressure, and improved performance~~ Add buffer pooling for reduced allocs, GC pressure, and improved performance Nov 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add buffer pooling for reduced allocs, GC pressure, and improved performance #691

Add buffer pooling for reduced allocs, GC pressure, and improved performance #691

Uh oh!

kainosnoema commented Oct 27, 2025 •

edited

Loading

Uh oh!

pavelnikolov Oct 28, 2025

Uh oh!

kainosnoema Nov 12, 2025

Uh oh!

pavelnikolov Nov 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kainosnoema commented Nov 12, 2025 •

edited

Loading

Uh oh!

kainosnoema commented Nov 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add buffer pooling for reduced allocs, GC pressure, and improved performance #691

Are you sure you want to change the base?

Add buffer pooling for reduced allocs, GC pressure, and improved performance #691

Uh oh!

Conversation

kainosnoema commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Isolated Benchmarks

Real-world Benchmarks (private to our app, includes db i/o)

Uh oh!

pavelnikolov Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

kainosnoema Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

pavelnikolov Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kainosnoema commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kainosnoema commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kainosnoema commented Oct 27, 2025 •

edited

Loading

kainosnoema commented Nov 12, 2025 •

edited

Loading

kainosnoema commented Nov 12, 2025 •

edited

Loading