-
Notifications
You must be signed in to change notification settings - Fork 491
Add buffer pooling for reduced allocs, GC pressure, and improved performance #691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implements sync.Pool for bytes.Buffer and map[string]*fieldToExec to reduce allocations and memory growth during GraphQL execution. Key improvements: - Buffer pool: 53-87% faster, 50-100% fewer allocations - Field map pool: 53% faster, 83% less memory per operation - GC pressure significantly reduced Changes: - Add internal/exec/pool.go with buffer and field map pooling - Update Execute(), execSelections(), execList() to use pools - Apply pooling to subscription handling - Add comprehensive tests and benchmarks Signed-off-by: Evan Owen <kainosnoema@gmail.com> Co-authored-by: Amp <amp@ampcode.com>
27f6cc1 to
ec053b1
Compare
internal/exec/pool.go
Outdated
| maxBufferCap = 64 * 1024 | ||
| maxFieldMapSize = 128 | ||
| newFieldMapSize = 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if these should be configurable through schema options, in case some projects have special requirements...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good question—I'm not sure what level it should be configurable on, and it's possible that there's actually not much need for configuring, and might just cause more problems than it solves by exposing it.
In any case, rather than storing these globally, alternatively we could try to maintain buffers per-schema and manage them there, making it easier to configure at that level if we want. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the idea about buffer pool per schema. Then it can even be configured easier.
- Remove field map pooling, difficult to configure and minimal benefit - Move pool implementation to internal/exec/resolvable/pools.go - Pool configuration passed to ApplyResolver, ensuring pools always initialized - Add MaxPooledBufferCap schema option to control buffer pool behavior - Add documentation to README with usage guidance
9e758aa to
776ce5e
Compare
This doesn't make a difference in practice, given only a few top level fields are mapped.
|
Ok @pavelnikolov I've pushed the changes to move the pool to the parsed schema (not the AST), and also ended up removing field map pooling since it was trickier to manage that interface given the private
# MaxPooledBufferCap = 0
BenchmarkMemory_SimpleQuery-12 121188 9297 ns/op 7814 B/op 124 allocs/op
BenchmarkMemory_ListQuery-12 14977 80722 ns/op 76599 B/op 1888 allocs/op
BenchmarkMemory_NestedQuery-12 21205 57341 ns/op 55271 B/op 1013 allocs/op
BenchmarkMemory_ListWithNestedLists-12 3260 372007 ns/op 355102 B/op 8762 allocs/op
BenchmarkMemory_Concurrent-12 6942 166021 ns/op 376719 B/op 9223 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12 128905 8828 ns/op 6564 B/op 103 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12 36316 33391 ns/op 33128 B/op 866 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12 52009 22965 ns/op 17160 B/op 383 allocs/op# MaxPooledBufferCap = 8 * 1024 (default)
BenchmarkMemory_SimpleQuery-12 131232 9113 ns/op 7126 B/op 113 allocs/op
BenchmarkMemory_ListQuery-12 17350 68485 ns/op 42860 B/op 1477 allocs/op
BenchmarkMemory_NestedQuery-12 23960 49852 ns/op 31577 B/op 814 allocs/op
BenchmarkMemory_ListWithNestedLists-12 3802 309700 ns/op 219748 B/op 6714 allocs/op
BenchmarkMemory_Concurrent-12 7854 129856 ns/op 235649 B/op 7082 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12 140816 8456 ns/op 6117 B/op 95 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12 44174 27355 ns/op 19208 B/op 657 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12 58017 20831 ns/op 12597 B/op 313 allocs/op# MaxPooledBufferCap = 16 * 1024
BenchmarkMemory_SimpleQuery-12 129972 9046 ns/op 7127 B/op 113 allocs/op
BenchmarkMemory_ListQuery-12 17528 68490 ns/op 42857 B/op 1477 allocs/op
BenchmarkMemory_NestedQuery-12 24189 49430 ns/op 31572 B/op 814 allocs/op
BenchmarkMemory_ListWithNestedLists-12 3910 306600 ns/op 177092 B/op 6701 allocs/op
BenchmarkMemory_Concurrent-12 16738 76312 ns/op 188818 B/op 7058 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12 140565 8480 ns/op 6117 B/op 95 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12 44102 27169 ns/op 19208 B/op 657 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12 57885 20922 ns/op 12597 B/op 313 allocs/op# MaxPooledBufferCap = 32 * 1024
BenchmarkMemory_SimpleQuery-12 131935 9262 ns/op 7126 B/op 113 allocs/op
BenchmarkMemory_ListQuery-12 17450 68805 ns/op 42849 B/op 1477 allocs/op
BenchmarkMemory_NestedQuery-12 24218 49551 ns/op 31575 B/op 814 allocs/op
BenchmarkMemory_ListWithNestedLists-12 3926 307991 ns/op 177105 B/op 6701 allocs/op
BenchmarkMemory_Concurrent-12 16626 71470 ns/op 188937 B/op 7058 allocs/op
BenchmarkMemory_AllocationsPerOp/Single-12 142057 8567 ns/op 6117 B/op 95 allocs/op
BenchmarkMemory_AllocationsPerOp/List_10-12 43944 27141 ns/op 19209 B/op 657 allocs/op
BenchmarkMemory_AllocationsPerOp/Nested_Depth3-12 57326 20924 ns/op 12597 B/op 313 allocs/op |
After more benchmarking, it seems like 8KB is a good default for most use cases. Increasing to 16KB only incrementally improves performance in most common responses, but increases memory usage for outliers. Signed-off-by: Evan Owen <evan@gluegroups.com>
|
@pavelnikolov after a bit more testing, seems like Clearly the optimal value is one where you can afford the memory in order to keep buffers around that are big enough to generate the entire response without re-allocating regularly. My sense is that 16KB might actually be a better default, and can be raised for deployments that have large responses. Update: I'm reminded that the GC will eventually release buffers in the pool that aren't in use so memory isn't retained permanently, but the issue is max-memory usage until the next GC when you have high variance usage. I found the discussion on this issue insightful—it's possible to automatically optimize the threshold for discarding large buffers using local statistics, but probably beyond the scope of this change. That said, given their choice of always re-using buffers >= 64KB in the example, I'm thinking we could safely choose a default value higher than 8KB, maybe even 32KB and remain in a reasonable zone. |
We've seen some high memory and GC pressure in our production deployment. One culprit seems to be
func (*Request) execSelections, which can create a lot of buffers that have to grow multiple times in large responses.One important optimization we can make here is buffer pooling. This PR implements sync.Pool for bytes.Buffer
and map[string]*fieldToExecto reduce allocations, memory growth, and GC pressure during execution.Changes:
and field mappoolingIsolated Benchmarks
Summary:
Real-world Benchmarks (private to our app, includes db i/o)