Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 38 additions & 9 deletions database/leveldb/db.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
"encoding/json"
"fmt"
"math"
"math/rand"
"slices"
"sync"
"time"
Expand Down Expand Up @@ -60,6 +61,18 @@ const (
// levelDBByteOverhead is the number of bytes of constant overhead that
// should be added to a batch size per operation.
levelDBByteOverhead = 8

// minCompactionL0Trigger is the minimum value for CompactionL0Trigger.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we observed that two of these parameters are needed to be changed at the same time?

Have we seen what happens when only one of them is changed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just get less variance if you do this, but I think we're shooting for the maximum possible variance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that, but if (for example) variable A is responsible for 99% of the variance and variable B is responsible for 1% of it then it makes sense to only tune A.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the algorithm, it shouldn't be that skewed, unless something is wrong with the implementation. I'm not sure this is worth testing to verify though, since even if it is as bad as 99/1 there's not any harm in changing both.

minCompactionL0Trigger = 3

// maxCompactionL0Trigger is the maximum value for CompactionL0Trigger.
Copy link
Contributor

@DracoLi DracoLi Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we tested how much compactions will be spread out if we are randomizing these two values? Its not clear to me if it will spread out enough to make a big impact.
If the total size per level is still the same, won't nodes hit a level 4 or 5 compaction at around the same time when previous level is full due to similar db size? It should not be the exact same time due to L0 trigger config but since L0 size is small, it might still be very close together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd have to see it in production to know for sure, but this is the best we can do so it's worth a shot.

maxCompactionL0Trigger = 7

// minCompactionTableSize is the minimum value for CompactionTableSize.
minCompactionTableSize = 2 * opt.MiB

// maxCompactionTableSize is the maximum value for CompactionTableSize.
maxCompactionTableSize = 4 * opt.MiB
)

var (
Expand Down Expand Up @@ -98,40 +111,40 @@ type config struct {
// BlockSize is the minimum uncompressed size in bytes of each 'sorted table'
// block.
//
// The default value is 4KiB.
// The default value is [opt.DefaultBlockSize].
BlockSize int `json:"blockSize"`
// CompactionExpandLimitFactor limits compaction size after expanded.
// This will be multiplied by table size limit at compaction target level.
//
// The default value is 25.
// The default value is [opt.DefaultCompactionExpandLimitFactor].
CompactionExpandLimitFactor int `json:"compactionExpandLimitFactor"`
// CompactionGPOverlapsFactor limits overlaps in grandparent (Level + 2)
// that a single 'sorted table' generates. This will be multiplied by
// table size limit at grandparent level.
//
// The default value is 10.
// The default value is [opt.DefaultCompactionGPOverlapsFactor].
CompactionGPOverlapsFactor int `json:"compactionGPOverlapsFactor"`
// CompactionL0Trigger defines number of 'sorted table' at level-0 that will
// trigger compaction.
//
// The default value is 4.
// The default value is a random value between [minCompactionL0Trigger] and [maxCompactionL0Trigger].
CompactionL0Trigger int `json:"compactionL0Trigger"`
// CompactionSourceLimitFactor limits compaction source size. This doesn't apply to
// level-0.
// This will be multiplied by table size limit at compaction target level.
//
// The default value is 1.
// The default value is [opt.DefaultCompactionSourceLimitFactor].
CompactionSourceLimitFactor int `json:"compactionSourceLimitFactor"`
// CompactionTableSize limits size of 'sorted table' that compaction generates.
// The limits for each level will be calculated as:
// CompactionTableSize * (CompactionTableSizeMultiplier ^ Level)
// The multiplier for each level can also fine-tuned using CompactionTableSizeMultiplierPerLevel.
//
// The default value is 2MiB.
// The default value is a random value between [minCompactionTableSize] and [maxCompactionTableSize].
CompactionTableSize int `json:"compactionTableSize"`
// CompactionTableSizeMultiplier defines multiplier for CompactionTableSize.
//
// The default value is 1.
// The default value is [opt.DefaultCompactionTableSizeMultiplier].
CompactionTableSizeMultiplier float64 `json:"compactionTableSizeMultiplier"`
// CompactionTableSizeMultiplierPerLevel defines per-level multiplier for
// CompactionTableSize.
Expand All @@ -145,11 +158,11 @@ type config struct {
// The multiplier for each level can also fine-tuned using
// CompactionTotalSizeMultiplierPerLevel.
//
// The default value is 10MiB.
// The default value is [opt.DefaultCompactionTotalSize].
CompactionTotalSize int `json:"compactionTotalSize"`
// CompactionTotalSizeMultiplier defines multiplier for CompactionTotalSize.
//
// The default value is 10.
// The default value is [opt.DefaultCompactionTotalSizeMultiplier].
CompactionTotalSizeMultiplier float64 `json:"compactionTotalSizeMultiplier"`
// DisableSeeksCompaction allows disabling 'seeks triggered compaction'.
// The purpose of 'seeks triggered compaction' is to optimize database so
Expand Down Expand Up @@ -185,6 +198,19 @@ type config struct {
MetricUpdateFrequency time.Duration `json:"metricUpdateFrequency"`
}

// randomizeCompactionParams sets compaction parameters in cfg to random values
// to spread out compaction behavior across different nodes.
//
// #nosec G404 // non-crypto randomness is fine here
func randomizeCompactionParams(cfg *config) {
const (
rangeCompactionL0Trigger = maxCompactionL0Trigger - minCompactionL0Trigger + 1
rangeCompactionTableSize = maxCompactionTableSize - minCompactionTableSize + 1
)
cfg.CompactionL0Trigger = rand.Intn(rangeCompactionL0Trigger) + minCompactionL0Trigger
cfg.CompactionTableSize = rand.Intn(rangeCompactionTableSize) + minCompactionTableSize
}

// New returns a wrapped LevelDB object.
func New(file string, configBytes []byte, log logging.Logger, reg prometheus.Registerer) (database.Database, error) {
parsedConfig := config{
Expand All @@ -196,6 +222,9 @@ func New(file string, configBytes []byte, log logging.Logger, reg prometheus.Reg
MaxManifestFileSize: DefaultMaxManifestFileSize,
MetricUpdateFrequency: DefaultMetricUpdateFrequency,
}

randomizeCompactionParams(&parsedConfig)

if len(configBytes) > 0 {
if err := json.Unmarshal(configBytes, &parsedConfig); err != nil {
return nil, fmt.Errorf("%w: %w", ErrInvalidConfig, err)
Expand Down
11 changes: 11 additions & 0 deletions database/leveldb/db_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,14 @@ func BenchmarkInterface(b *testing.B) {
}
}
}

func TestRandomizeCompactionParams(t *testing.T) {
var cfg config
randomizeCompactionParams(&cfg)

require.GreaterOrEqual(t, cfg.CompactionL0Trigger, minCompactionL0Trigger)
require.LessOrEqual(t, cfg.CompactionL0Trigger, maxCompactionL0Trigger)

require.GreaterOrEqual(t, cfg.CompactionTableSize, minCompactionTableSize)
require.LessOrEqual(t, cfg.CompactionTableSize, maxCompactionTableSize)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe a test that 100 of them don't all generate exactly the same configuration?