Unsync LevelDB Compactions Across Nodes #4482

samliok · 2025-11-07T16:37:29Z

Why this should be merged

We have noticed synchronous spikes in LevelDB compactions across our nodes. This potentially causes network blips if Disk IO or CPU resources get overconsumed by many nodes in the network at the same time.

How this works

Our nodes are configured by using the default levelDB configuration. This may be the reason for why we are seeing synchrony in compactions. We update this by deterministically randomizing two levelDB config parameters when levelDB is set up. These parameters should provide enough variance to offset the compaction synchrony.

CompactionL0Trigger -> The default for this is 4. This means compaction for level-0 tables will start when 4 L0 files are created. We set this value to be a random value(seeded by nodeID) from 3-7. By changing this value we get variance in when L0 compactions are triggered.

CompactionTableSize -> The default value for this is 2MiB. It defines the target size for files produced by compaction. By increasing this, compaction runs less often but consumes more cpu & I/O. We now randomize this value from 2MiB - 4MiB

How this was tested

I spun up a mainnet node on an EC2 instance. Initially it was created with the default compaction config for levelDB, but then I restarted the node with two different sets of configs. One with CompactionTrigger to 3 and Size to 4MiB and then another restart with a CompationL0Trigger of 7 and Size of 3MiB. The node appeared healthy after the restart.

Need to be documented in RELEASES.md?

Copilot

Pull Request Overview

This PR adds node-specific randomization of LevelDB compaction parameters to spread out compaction behavior across different nodes in the network. The implementation uses the node ID as a seed to generate deterministic but varied compaction settings.

Adds a nodeID parameter to the leveldb.New function signature
Introduces a randomizeCompaction function that derives compaction settings from the node ID
Updates all call sites to provide a node ID (using ids.EmptyNodeID for tests)

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
database/leveldb/db.go	Adds `randomizeCompaction` function and updates `New` to accept `nodeID` parameter; updates documentation comments
database/factory/factory.go	Updates `New` function signature to accept and pass through `nodeID` parameter
database/leveldb/db_test.go	Updates test calls to `New` to pass `ids.EmptyNodeID`
node/node.go	Updates database initialization to pass node ID
tests/reexecute/c/vm_reexecute_test.go	Updates test calls to `New` to pass `ids.EmptyNodeID`
vms/platformvm/validators/manager_benchmark_test.go	Updates benchmark call to `New` to pass `ids.EmptyNodeID`

Comments suppressed due to low confidence (1)

database/leveldb/db.go:233

The randomized compaction parameters will be overwritten by any user-provided configuration. This may result in unexpected behavior if users are unaware that their config will override the randomization. Consider either: (1) only applying randomization when configBytes is empty, or (2) documenting this behavior clearly, or (3) only randomizing unset fields after unmarshaling.

	randomizeCompaction(nodeID, &parsedConfig)

	if len(configBytes) > 0 {
		if err := json.Unmarshal(configBytes, &parsedConfig); err != nil {
			return nil, fmt.Errorf("%w: %w", ErrInvalidConfig, err)
		}
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

database/leveldb/db.go

rkuris

Approved with suggestions / nits

database/leveldb/db.go

rkuris · 2025-11-07T16:51:39Z

database/leveldb/db.go

 		MetricUpdateFrequency:  DefaultMetricUpdateFrequency,
 	}
+
+	randomizeCompaction(nodeID, &parsedConfig)


may want to mention here that configuration will override the returns from here

rkuris · 2025-11-07T16:52:59Z

database/leveldb/db.go

+		bytes = paddedBytes
+	}
+	seed := binary.BigEndian.Uint64(bytes)
+	r := rand.New(rand.NewSource(int64(seed)))


nit: I would have pulled out the creation of a random source from a NodeId into a separate, testable function

added tests for this function + renamed

rkuris · 2025-11-07T18:06:23Z

database/leveldb/db_test.go

+		require.Equal(t, cfg1.CompactionL0Trigger, cfg2.CompactionL0Trigger)
+		require.Equal(t, cfg1.CompactionTableSize, cfg2.CompactionTableSize)
+	})
+}


nit: maybe a test that 100 of them don't all generate exactly the same configuration?

yacovm · 2025-11-07T19:17:32Z

database/leveldb/db.go

 	// should be added to a batch size per operation.
 	levelDBByteOverhead = 8
+
+	// minCompactionL0Trigger is the minimum value for CompactionL0Trigger.


Have we observed that two of these parameters are needed to be changed at the same time?

Have we seen what happens when only one of them is changed?

You just get less variance if you do this, but I think we're shooting for the maximum possible variance.

I get that, but if (for example) variable A is responsible for 99% of the variance and variable B is responsible for 1% of it then it makes sense to only tune A.

Based on the algorithm, it shouldn't be that skewed, unless something is wrong with the implementation. I'm not sure this is worth testing to verify though, since even if it is as bad as 99/1 there's not any harm in changing both.

DracoLi · 2025-11-07T19:42:07Z

database/leveldb/db.go

+
 // New returns a wrapped LevelDB object.
-func New(file string, configBytes []byte, log logging.Logger, reg prometheus.Registerer) (database.Database, error) {
+func New(nodeID ids.NodeID, file string, configBytes []byte, log logging.Logger, reg prometheus.Registerer) (database.Database, error) {


Does making this take a NodeID make sense?
db.leveldb is a general kv db but this makes db initialization dependent on having a NodeID when this package is used.

If we don't have a NodeID then we don't care about being out of sync. Or at least we don't know what we're being out of sync with.

This lets us maintain the exact same configuration on a restart with the same NodeID so IMO this is the right approach.

should this just take something like a seed int? factory can convert the NodeID to a seed when creating a leveldb. I think its kind of weird needing to pass ids.EmptyNodeID when creating a leveldb.

ended up removing the NodeID because I did find it weird requiring that to be a part of the constructor. Plus I don't think we really care if the levelDB config is randomized on restart(as long as it different changes don't break the node, which seems like they don't).

which seems like they don't

This is a requirement for merging anyways, as we are going to be deploying this on nodes that are already running a different config.

DracoLi · 2025-11-07T20:04:30Z

database/leveldb/db.go

+	// minCompactionL0Trigger is the minimum value for CompactionL0Trigger.
+	minCompactionL0Trigger = 3
+
+	// maxCompactionL0Trigger is the maximum value for CompactionL0Trigger.


Have we tested how much compactions will be spread out if we are randomizing these two values? Its not clear to me if it will spread out enough to make a big impact.
If the total size per level is still the same, won't nodes hit a level 4 or 5 compaction at around the same time when previous level is full due to similar db size? It should not be the exact same time due to L0 trigger config but since L0 size is small, it might still be very close together.

I think we'd have to see it in production to know for sure, but this is the best we can do so it's worth a shot.

rkuris · 2025-11-07T23:56:15Z

You might just want another constructor that doesn't randomize these values. What is the use case?

…

On Fri, Nov 7, 2025, 1:44 PM Draco ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In database/leveldb/db.go <#4482 (comment)> : > // New returns a wrapped LevelDB object. -func New(file string, configBytes []byte, log logging.Logger, reg prometheus.Registerer) (database.Database, error) { +func New(nodeID ids.NodeID, file string, configBytes []byte, log logging.Logger, reg prometheus.Registerer) (database.Database, error) { should this just take something like a seed int? factory can convert the NodeID to a seed when creating a leveldb. I think its kind of weird needing to pass ids.EmptyNodeID when creating a kvdb. — Reply to this email directly, view it on GitHub <#4482 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAYLR3HSH4NYCCGAGYMZYM333UHETAVCNFSM6AAAAACLOYKX5WVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTIMZWGU2DONZUGM> . You are receiving this because you commented.Message ID: ***@***.***>

database/factory/factory.go

database/leveldb/db_test.go

database/leveldb/db.go

Copilot AI review requested due to automatic review settings November 7, 2025 16:37

samliok requested review from StephenButtolph and aaronbuchwald as code owners November 7, 2025 16:37

github-project-automation bot added this to avalanchego Nov 7, 2025

Copilot AI reviewed Nov 7, 2025

View reviewed changes

database/leveldb/db.go Outdated Show resolved Hide resolved

rkuris approved these changes Nov 7, 2025

View reviewed changes

rkuris reviewed Nov 7, 2025

View reviewed changes

yacovm reviewed Nov 7, 2025

View reviewed changes

joshua-kim moved this to In Progress 🏗️ in avalanchego Nov 7, 2025

samliok self-assigned this Nov 7, 2025

DracoLi reviewed Nov 7, 2025

View reviewed changes

samliok mentioned this pull request Nov 7, 2025

Update Database Initialization To Take In NodeID ava-labs/subnet-evm#1837

Open

samliok force-pushed the leveldb-config branch from abe98bc to fa6fb35 Compare November 12, 2025 15:48

StephenButtolph reviewed Nov 12, 2025

View reviewed changes

randomize leveldb compaction config

602c21a

samliok force-pushed the leveldb-config branch from 51a56ba to 602c21a Compare November 12, 2025 18:14

Unsync LevelDB Compactions Across Nodes #4482

Are you sure you want to change the base?

Unsync LevelDB Compactions Across Nodes #4482

Conversation

samliok commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this should be merged

How this works

How this was tested

Need to be documented in RELEASES.md?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

rkuris left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DracoLi Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DracoLi Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DracoLi Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rkuris commented Nov 7, 2025 via email

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

samliok commented Nov 7, 2025 •

edited

Loading

DracoLi Nov 7, 2025 •

edited

Loading

DracoLi Nov 7, 2025 •

edited

Loading

DracoLi Nov 7, 2025 •

edited

Loading