KAFKA-20036 Handle LogCleaner segment overflow caused by compression level changes#21379
Open
m1a2st wants to merge 4 commits intoapache:trunkfrom
Open
KAFKA-20036 Handle LogCleaner segment overflow caused by compression level changes#21379m1a2st wants to merge 4 commits intoapache:trunkfrom
m1a2st wants to merge 4 commits intoapache:trunkfrom
Conversation
chia7712
reviewed
Jan 31, 2026
| try { | ||
| // it's OK not to hold the Log's lock in this case, because this segment is only accessed by other threads | ||
| // after `Log.replaceSegments` (which acquires the lock) is called | ||
| dest.append(result.maxOffset(), retained); |
Member
There was a problem hiding this comment.
Could you wrap only dest.append in the try-catch block to avoid catching unrelated error?
|
|
||
| public SegmentOverflowException(LogSegment segment) { | ||
| super("Segment size would overflow during compaction for segment " + segment); | ||
| this.segment = segment; |
| log.name(), new Date(cleanableHorizonMs), new Date(legacyDeleteHorizonMs)); | ||
| CleanedTransactionMetadata transactionMetadata = new CleanedTransactionMetadata(); | ||
|
|
||
| double sizeRatio = 1.0; |
Member
There was a problem hiding this comment.
Would something like this work better?
double sizeRatio = segmentOverflowPartitions.getOrDefault(log.topicPartition(), 1.0);
if (sizeRatio != 1.0) {
logger.info("Partition {} has overflow history. " + "Reducing effective segment size to {}% for this round.",
log.topicPartition(), sizeRatio * 100);
}| cleanSegments(log, group, offsetMap, currentTime, stats, transactionMetadata, legacyDeleteHorizonMs, upperBoundOffset); | ||
| } | ||
|
|
||
| if (segmentOverflowPartitions.containsKey(log.topicPartition())) { |
Member
There was a problem hiding this comment.
if (segmentOverflowPartitions.remove(log.topicPartition()) != null) {
logger.info("Successfully cleaned log {} with degraded size (ratio: {}%). " +
"Cleared overflow marker. Next cleaning will use normal size.",
log.name(), sizeRatio * 100);
}| currentTime | ||
| ); | ||
| } catch (SegmentOverflowException e) { | ||
| if (segmentOverflowPartitions.containsKey(log.topicPartition())) { |
Member
There was a problem hiding this comment.
var previousRatio = segmentOverflowPartitions.put(log.topicPartition(),
segmentOverflowPartitions.getOrDefault(log.topicPartition(), 1.0) * 0.9);
if (previousRatio == null) {
logger.warn("Segment overflow detected for partition {}: {}. " +
"Marked for degradation to 90% size in next cleaning round.",
log.topicPartition(), e.getMessage());
} else {
logger.warn("Repeated segment overflow for partition {}: {}. " +
"Further degrading to {}% size in next cleaning round.",
log.topicPartition(), e.getMessage(), previousRatio * 0.9 * 100);
}
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We add a new map to record which topic partitions have experienced
overflow. When an overflow occurs, the next time the group is
processed, we reduce the segment size by a factor of 0.9 to prevent the
overflow from happening again. If the partition still overflows, we
continue to multiply the ratio by 0.9 on subsequent attempts until the
partition is successfully cleaned.