Fix Issue#83, limit number of segments for high-dimensional key space #84

rkrenzler · 2025-08-06T08:53:10Z

Reduce the number of segments returned by choose_checkpoints. Note, even if each key dimension is split only once, a high number of key dimensions can still result in an excessive number of segments. This significantly lowers the performance of reladiff. For example, 20 key columns may lead to 1.048.576 segments.

…l key space Reduce the number of segments returned by choose_checkpoints. Note, even if each key dimension is split only once, a high number of key dimensions can still result in an excessive number of segments. This significantly lowers the performance of reladiff. For example, 20 key columns may lead to 1.048.576 segments.

erezsh · 2025-08-07T08:57:37Z

Can you explain the intuition behind the new split_compound_key_space?

I understand the need to limit the number of segments, but I'm asking about the specific method.

rkrenzler · 2025-09-10T05:37:40Z

Remark: k splits leads to k+1 segments.
The current implementation estimates the number of splits for each dimension from the total number of splits for the whole key space in the function choose_checkpoints(...) with formula count = int(count ** (1 / len(self.key_columns))) or 1.
The number of splits per dimension is stored to the variable count and it is at least 1. Then split_compound_key_space uses this count to split every key dimension. This means, that even if we want to split the whole key space in two segments only, we will get at least 2^n sections (all n dimensions split at least once) where n is the dimension of the key space.

In the new implementation I do not try to split all the dimensions from the very beginning. I start splitting dimensions one-by-one from the beginning by the number estimated by choose_checkpoints(...). Each dimension I split with the function split_key_space(...). In each step I check if I still need to split. If I reach enough segments the splitting will stop. The remaining dimensions will not be split. Because the function split_key_space() does not split a dimension if the difference between min and max is 1, at some point the very first dimensions will not be split anymore, and the algorithm starts to split the following dimensions.

Example: Split 3 dimensional keys [1,1,1] [3,3,3] into maximally two segments in the worst-case scenario.

Old implementation:

It splits every dimension by 1 and creates a mesh [1,2,3], [1,2,3], [1,2,3] which splits the key space into 8 segments.
In the subsequent interaction it investigates [1,2], [1,2,3], [1,2,3] and [2,3], [1,2,3], [1,2,3]. ( [1,2] and [2,3] are not more splitable by split_key_space) and so on. We can see it in layers:
([1,2,3], [1,2,3], [1,2,3])
([1,2], [1,2,3], [1,2,3]) ([2,3], [1,2,3], [1,2,3])
([1,2], [1,2], [1,2,3]), ([1,2], [2,3], [1,2,3]), ([2,3], [1,2], [1,2,3]) ([2,3], [2,3], [1,2,3])
([1,2], [1,2], [1,2]), ([1,2], [1,2], [2,3]) ([1,2], [2,3], [1,2]), ([1,2], [2,3], [2,3]) ([2,3], [1,2], [1,2]), ([2,3], [1,2], [2,3]) ...

New implementation:
It splits keys by maximally two segments for the whole space:
([1,2,3], [1,3], [1,3]) # maximal 2 segments for the whole space reached after the first split. Stop splitting until the next call.
([1,2], [1,2,3], [1,3]) # 2 segments reached after the second split,... ([2,3], [1,2,3], [1,3])
([1,2], [1,2], [1,2,3]), ([1,2], [2,3], [1,2,3]), ([2,3], [1,2], [1,2,3]) ([2,3], [2,3], [1,2,3])
([1,2], [1,2], [1,2]), ([1,2], [1,2], [2,3]) ([1,2], [2,3], [1,2]), ([1,2], [2,3], [2,3]) ([2,3], [1,2], [1,2]), ([2,3], [1,2], [2,3]) ...

rkrenzler mentioned this pull request Aug 6, 2025

Reladiff is too slow with large number of key-columns #83

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Issue#83, limit number of segments for high-dimensional key space #84

Fix Issue#83, limit number of segments for high-dimensional key space #84

Uh oh!

rkrenzler commented Aug 6, 2025

Uh oh!

erezsh commented Aug 7, 2025

Uh oh!

rkrenzler commented Sep 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix Issue#83, limit number of segments for high-dimensional key space #84

Are you sure you want to change the base?

Fix Issue#83, limit number of segments for high-dimensional key space #84

Uh oh!

Conversation

rkrenzler commented Aug 6, 2025

Uh oh!

erezsh commented Aug 7, 2025

Uh oh!

rkrenzler commented Sep 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants