Skip to content

Conversation

@kainosnoema
Copy link
Contributor

@kainosnoema kainosnoema commented Oct 27, 2025

We've seen some high memory and GC pressure in our production deployment. One culprit seems to be func (*Request) execSelections, which can create a lot of buffers that have to grow multiple times in large responses.

One important optimization we can make here is buffer pooling. This PR implements sync.Pool for bytes.Buffer and map[string]*fieldToExec to reduce allocations, memory growth, and GC pressure during execution.

Changes:

  • Add internal/exec/pool.go with buffer and field map pooling
  • Update Execute(), execSelections(), execList() to use pools
  • Apply pooling to subscription handling
  • Tests and benchmarks

Isolated Benchmarks

Summary:

  • ListQuery: 111µs → 78µs (29% faster), 124KB → 43KB (65% less memory)
  • ListWithNestedLists: 428µs → 346µs (19% faster), 637KB → 177KB (72% less memory)
  • Concurrent: 219µs → 95µs (56% faster), 659KB → 189KB (71% less memory)
# Without Pooling
BenchmarkMemory_SimpleQuery-12                    115488             10336 ns/op            9925 B/op        126 allocs/op
BenchmarkMemory_ListQuery-12                       10000            111123 ns/op          124602 B/op       2037 allocs/op
BenchmarkMemory_NestedQuery-12                     17557             61133 ns/op           80886 B/op       1074 allocs/op
BenchmarkMemory_ListWithNestedLists-12              2841            428631 ns/op          637248 B/op       9660 allocs/op
BenchmarkMemory_Concurrent-12                       5092            219808 ns/op          659525 B/op      10124 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12                123319              9433 ns/op            8691 B/op        106 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12                23680             42620 ns/op           80833 B/op       1016 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12          50025             24150 ns/op           37659 B/op        444 allocs/op

# With Pooling
BenchmarkMemory_SimpleQuery-12                    131162              9250 ns/op            7423 B/op        110 allocs/op
BenchmarkMemory_ListQuery-12                       16544             78654 ns/op           43204 B/op       1474 allocs/op
BenchmarkMemory_NestedQuery-12                     23664             59657 ns/op           36693 B/op        810 allocs/op
BenchmarkMemory_ListWithNestedLists-12              3644            346407 ns/op          177613 B/op       6696 allocs/op
BenchmarkMemory_Concurrent-12                      13980             95817 ns/op          189372 B/op       7055 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12                119922              9149 ns/op            6429 B/op         93 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12                39200             30693 ns/op           19197 B/op        655 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12          58494             20190 ns/op           12544 B/op        309 allocs/op

Real-world Benchmarks (private to our app, includes db i/o)

# Without Pooling
BenchmarkMeQuery-12         	    4756	    708655 ns/op	  104785 B/op	    1377 allocs/op
BenchmarkThreadsQuery-12    	     252	  14639531 ns/op	 5990799 B/op	  101922 allocs/op

# With Pooling
BenchmarkMeQuery-12         	    4786	    693537 ns/op	   91532 B/op	    1351 allocs/op
BenchmarkThreadsQuery-12    	     312	  10358961 ns/op	 5519296 B/op	   98256 allocs/op

kainosnoema and others added 3 commits October 24, 2025 18:17
Implements sync.Pool for bytes.Buffer and map[string]*fieldToExec to
reduce allocations and memory growth during GraphQL execution.

Key improvements:
- Buffer pool: 53-87% faster, 50-100% fewer allocations
- Field map pool: 53% faster, 83% less memory per operation
- GC pressure significantly reduced

Changes:
- Add internal/exec/pool.go with buffer and field map pooling
- Update Execute(), execSelections(), execList() to use pools
- Apply pooling to subscription handling
- Add comprehensive tests and benchmarks

Signed-off-by: Evan Owen <kainosnoema@gmail.com>
Co-authored-by: Amp <amp@ampcode.com>
Comment on lines 9 to 11
maxBufferCap = 64 * 1024
maxFieldMapSize = 128
newFieldMapSize = 16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these should be configurable through schema options, in case some projects have special requirements...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question—I'm not sure what level it should be configurable on, and it's possible that there's actually not much need for configuring, and might just cause more problems than it solves by exposing it.

In any case, rather than storing these globally, alternatively we could try to maintain buffers per-schema and manage them there, making it easier to configure at that level if we want. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the idea about buffer pool per schema. Then it can even be configured easier.

- Remove field map pooling, difficult to configure and minimal benefit
- Move pool implementation to internal/exec/resolvable/pools.go
- Pool configuration passed to ApplyResolver, ensuring pools always initialized
- Add MaxPooledBufferCap schema option to control buffer pool behavior
- Add documentation to README with usage guidance
This doesn't make a difference in practice, given
only a few top level fields are mapped.
@kainosnoema
Copy link
Contributor Author

kainosnoema commented Nov 12, 2025

Ok @pavelnikolov I've pushed the changes to move the pool to the parsed schema (not the AST), and also ended up removing field map pooling since it was trickier to manage that interface given the private internal/exec types there and didn't end up contributing much at all to the overall alloc reduction vs. the buffer pooling. Also reduced the default max buffer size from 64KB to 8KB since it seems to be diminishing returns for most usage above that.

  • Pools now managed per-schema in internal/exec/resolvable/schema.go
  • Configuration passed to newSchema(), ensuring pools are always initialized
  • Added MaxPooledBufferCap(n) schema option to control buffer pool behavior (default: 8KB)
  • Removed field map pooling (premature optimization—doesn't help much in practice)
# MaxPooledBufferCap = 0
BenchmarkMemory_SimpleQuery-12            	  121188	      9297 ns/op	    7814 B/op	     124 allocs/op
BenchmarkMemory_ListQuery-12              	   14977	     80722 ns/op	   76599 B/op	    1888 allocs/op
BenchmarkMemory_NestedQuery-12            	   21205	     57341 ns/op	   55271 B/op	    1013 allocs/op
BenchmarkMemory_ListWithNestedLists-12    	    3260	    372007 ns/op	  355102 B/op	    8762 allocs/op
BenchmarkMemory_Concurrent-12             	    6942	    166021 ns/op	  376719 B/op	    9223 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12         	  128905	      8828 ns/op	    6564 B/op	     103 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12        	   36316	     33391 ns/op	   33128 B/op	     866 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12  	   52009	     22965 ns/op	   17160 B/op	     383 allocs/op
# MaxPooledBufferCap = 8 * 1024 (default)
BenchmarkMemory_SimpleQuery-12            	  131232	      9113 ns/op	    7126 B/op	     113 allocs/op
BenchmarkMemory_ListQuery-12              	   17350	     68485 ns/op	   42860 B/op	    1477 allocs/op
BenchmarkMemory_NestedQuery-12            	   23960	     49852 ns/op	   31577 B/op	     814 allocs/op
BenchmarkMemory_ListWithNestedLists-12    	    3802	    309700 ns/op	  219748 B/op	    6714 allocs/op
BenchmarkMemory_Concurrent-12             	    7854	    129856 ns/op	  235649 B/op	    7082 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12         	  140816	      8456 ns/op	    6117 B/op	      95 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12        	   44174	     27355 ns/op	   19208 B/op	     657 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12  	   58017	     20831 ns/op	   12597 B/op	     313 allocs/op
# MaxPooledBufferCap = 16 * 1024
BenchmarkMemory_SimpleQuery-12            	  129972	      9046 ns/op	    7127 B/op	     113 allocs/op
BenchmarkMemory_ListQuery-12              	   17528	     68490 ns/op	   42857 B/op	    1477 allocs/op
BenchmarkMemory_NestedQuery-12            	   24189	     49430 ns/op	   31572 B/op	     814 allocs/op
BenchmarkMemory_ListWithNestedLists-12    	    3910	    306600 ns/op	  177092 B/op	    6701 allocs/op
BenchmarkMemory_Concurrent-12             	   16738	     76312 ns/op	  188818 B/op	    7058 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12         	  140565	      8480 ns/op	    6117 B/op	      95 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12        	   44102	     27169 ns/op	   19208 B/op	     657 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12  	   57885	     20922 ns/op	   12597 B/op	     313 allocs/op
# MaxPooledBufferCap = 32 * 1024
BenchmarkMemory_SimpleQuery-12            	  131935	      9262 ns/op	    7126 B/op	     113 allocs/op
BenchmarkMemory_ListQuery-12              	   17450	     68805 ns/op	   42849 B/op	    1477 allocs/op
BenchmarkMemory_NestedQuery-12            	   24218	     49551 ns/op	   31575 B/op	     814 allocs/op
BenchmarkMemory_ListWithNestedLists-12    	    3926	    307991 ns/op	  177105 B/op	    6701 allocs/op
BenchmarkMemory_Concurrent-12             	   16626	     71470 ns/op	  188937 B/op	    7058 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12         	  142057	      8567 ns/op	    6117 B/op	      95 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12        	   43944	     27141 ns/op	   19209 B/op	     657 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12  	   57326	     20924 ns/op	   12597 B/op	     313 allocs/op

@kainosnoema kainosnoema changed the title Add memory pooling for buffers and field maps for improved memory usage and performance Add memory pooling for buffers for improved memory usage and performance Nov 12, 2025
After more benchmarking, it seems like 8KB is a good default for most use cases.
Increasing to 16KB only incrementally improves performance in most common
responses, but increases memory usage for outliers.

Signed-off-by: Evan Owen <evan@gluegroups.com>
@kainosnoema
Copy link
Contributor Author

kainosnoema commented Nov 12, 2025

@pavelnikolov after a bit more testing, seems like BenchmarkMemory_Concurrent-12 is consistently showing much better performance with MaxPooledBufferCap at 16KB vs 8KB—nearly 60% faster. It feels like a common case in production, but I can't decide where the default should be here.... let me know what you think.

Clearly the optimal value is one where you can afford the memory in order to keep buffers around that are big enough to generate the entire response without re-allocating regularly. My sense is that 16KB might actually be a better default, and can be raised for deployments that have large responses.

Update: I'm reminded that the GC will eventually release buffers in the pool that aren't in use so memory isn't retained permanently, but the issue is max-memory usage until the next GC when you have high variance usage. I found the discussion on this issue insightful—it's possible to automatically optimize the threshold for discarding large buffers using local statistics, but probably beyond the scope of this change. That said, given their choice of always re-using buffers >= 64KB in the example, I'm thinking we could safely choose a default value higher than 8KB, maybe even 32KB and remain in a reasonable zone.

@kainosnoema kainosnoema changed the title Add memory pooling for buffers for improved memory usage and performance Add memory pooling for buffers for reduced allocs, GC pressure, and improved performance Nov 13, 2025
@kainosnoema kainosnoema changed the title Add memory pooling for buffers for reduced allocs, GC pressure, and improved performance Add buffer pooling for reduced allocs, GC pressure, and improved performance Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants