treap, ffldb: improve pool recycling, add batched Delete#2513
Open
treap, ffldb: improve pool recycling, add batched Delete#2513
Conversation
In this commit, we improve the node recycling mechanism in the immutable treap's batched Put operation and extend the same optimization to Delete. The original recycling check called newTreap.get(node.key) for each intermediate node to determine whether it was still live in the latest treap. This involved O(depth * log n) key comparisons per batch iteration. We replace this with a pointer-set approach: put() now returns both the set of newly created nodes and the set of original source nodes that were cloned (the "replaced" set). The recycling loop then checks pointer identity via nodeInSet(), reducing the cost to O(depth^2) pointer comparisons (~400 ops for a typical depth of 20, versus ~8000 key comparisons previously). Additionally, we document the MVCC snapshot safety argument directly in the Put() godoc. The key insight is that intermediate clones are always fresh pool allocations, never shared with any pre-existing snapshot. Only clones that were subsequently re-cloned (and thus replaced) are recycled, while clones still structurally shared with the new treap are left untouched. We also extend Delete to accept variadic keys, mirroring Put's batched API. The internal delete() method returns the same (created, replaced) arrays, enabling the same inter-step recycling. Previously, Delete only accepted a single key and allocated cloned nodes from the pool without ever returning them, creating asymmetric pool drain during IBD when commitTx issued many individual Delete calls. Finally, Put now guards against zero-length input with an early return.
In this commit, we update commitTx to collect deletion keys into slices and issue single batched Delete calls rather than calling Delete individually inside the ForEach loop. This allows the treap's inter-step node recycling to reclaim intermediate clones that were previously leaked back to the GC. We also fix the pendingRemove path to pass nil as the value in KVPairs rather than the actual value from the mutable treap. The remove treap only tracks keys for deletion and never uses the stored values, so retaining references to them was unnecessary.
In this commit, we fix three test error messages that printed the wrong
expected value on failure. The Len check compared against (i+1)*keyCount
but the error message only printed i+1, which would have been misleading
if the assertion ever fired.
We also add two new snapshot immutability tests to the existing
TestImmutableSnapshot function:
A single-element Put test verifies that inserting one key via
Put(KVPair{k,v}) does not corrupt a snapshot taken before the
insertion. This exercises the recycling path where prevCreated is empty
and no recycling should occur.
A small-batch (3 keys) test exercises the recycling logic at a
granularity where intermediate node recycling occurs between the 2nd
and 3rd insertions. After the batch Put, we verify that all 1000
original keys in the snapshot remain accessible with correct values,
confirming that recycling does not corrupt structurally shared nodes.
In this commit, we add four benchmark functions to measure the performance of the immutable treap's pool-based node recycling: BenchmarkImmutablePutBatch exercises batched Put at batch sizes of 1, 10, 100, and 1000 keys using SHA-256 hashed keys to produce unordered insertion patterns. BenchmarkImmutableDeleteBatch measures batched Delete at the same scales, first building a treap and then deleting all keys in one batch. BenchmarkImmutablePutSequential simulates the IBD hot path by performing 10 rounds of 100-key batch inserts into a growing treap, exercising recycling across multiple Put calls. BenchmarkImmutableMixedPutDelete mirrors the commitTx pattern in dbcache.go by inserting 500 keys then deleting 250, measuring the combined allocation overhead of interleaved batch operations.
Pull Request Test Coverage Report for Build 23569826387Details
💛 - Coveralls |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Description
In this PR, we improve the
sync.Pool-based node recycling introduced in #2425 and extend the same optimization toDeleteoperations.Faster Recycling Check in
Put()The original
Put()recycling loop callednewTreap.get(node.key)for each intermediate node to check if it was still live in the latest treap version. This worked correctly but cost O(depth × log n) key comparisons per batch iteration. We replace this with a pointer-set approach:put()now returns both the set of newly created nodes and the set of original source nodes that were cloned (the "replaced" set). The recycling loop checks pointer identity vianodeInSet(), which is just a linear scan of a[staticDepth]*treapNodearray. This brings the cost down to O(depth²) pointer comparisons, i.e. ~400 pointer equality checks for a typical depth of 20, versus ~8000 key comparisons before.We also document the MVCC snapshot safety argument directly in the
Put()godoc. The core insight is that intermediate clones are always fresh pool allocations, never shared with any pre-existing snapshot. Only clones that were subsequently re-cloned (and thus replaced in the new treap) are candidates for recycling. Clones still structurally shared with the new treap are left alone.To gain confidence in this safety argument, we built a Quint formal model (TLA+-inspired) that models the treap as a set of node IDs, non-deterministically applies batched puts, deletes, snapshot captures, and snapshot releases, then checks three invariants across 30,000 randomized traces: no recycled node appears in any snapshot, in the current treap, or anywhere live. All three invariants passed. The model confirmed that fresh clone IDs are always >=
nextIdat allocation time, so they can never collide with any pre-existing snapshot, which is exactly what allows us to use the simpler pointer-set check instead of a full treap traversal.Batched
Delete()with Pool RecyclingPreviously,
Deleteaccepted a single key and allocated cloned nodes from the pool viacloneTreapNodewithout ever returning them. During IBD,commitTxissued many individualDeletecalls insideForEachloops, creating a one-way pool drain where nodes were pulled from the pool but never recycled back.We extend
Deleteto accept variadic keys (Delete(keys ...[]byte)), mirroringPut's batched API. The internaldelete()method returns the same(created, replaced)arrays, enabling the same inter-step recycling. ThecommitTxcallsite indbcache.gonow collects deletion keys into slices and issues single batched calls. We also fix thependingRemovepath to passnilas the value inKVPairrather than carrying the actual value, since the remove treap only tracks keys.Test Fixes
We fix three test error messages that printed
i+1as the expected value when the actual expected value was(i+1)*keyCount. We also add single-key and small-batch (3 keys) snapshot immutability tests that exercise the recycling logic at the granularity where intermediate node recycling actually occurs. The small-batch test additionally verifies that all 1000 original keys in the snapshot remain accessible with correct values after aPutwith recycling.Benchmarks
The batched Delete shows 30-57% reduction in bytes allocated and 29-48% fewer allocations. The mixed Put+Delete benchmark (which mirrors the
commitTxpattern) shows a 13% reduction in both memory and allocations.Put performance is roughly neutral, as expected: the pointer-set approach trades O(log n) key comparisons for O(depth) pointer comparisons, which are similar in practice for typical treap sizes.
See each commit message for a detailed description w.r.t the incremental changes.
Steps to Test
Pull Request Checklist
Testing
Code Style and Documentation