Skip to content

KAFKA-17676: Fix NPE when starting tasks after config topic compaction#21368

Open
ayalamark-afk wants to merge 1 commit intoapache:trunkfrom
ayalamark-afk:KAFKA-17676-clean
Open

KAFKA-17676: Fix NPE when starting tasks after config topic compaction#21368
ayalamark-afk wants to merge 1 commit intoapache:trunkfrom
ayalamark-afk:KAFKA-17676-clean

Conversation

@ayalamark-afk
Copy link

@ayalamark-afk ayalamark-afk commented Jan 29, 2026

Summary

Fix for KAFKA-17676 - NullPointerException when starting tasks after the connect-configs topic has been compacted.

Problem

When the connect-configs topic gets compacted, two scenarios can cause NPE crashes:

  1. Incomplete task configs due to compaction: Task config records get compacted leaving an incomplete set, but connectorTaskCounts is still updated, causing NPE when trying to start tasks without configs.

  2. Missing connector config: Connector config records can be removed by compaction even though the connector is still active. The fix for KAFKA-16838 assumes that if a connector config is missing, the connector was deleted, and ignores task configs. This causes an NPE when trying to start tasks:

java.lang.NullPointerException: Cannot invoke "java.util.Map.size()" because "inputMap" is null
at org.apache.kafka.connect.runtime.TaskConfig.
at org.apache.kafka.connect.runtime.Worker.startTask

Solution

Two fixes in processTasksCommitRecord:

Fix 1: Only update connectorTaskCounts when we actually apply task configs

  • Moves connectorTaskCounts.put() inside the else block
  • Prevents advertising task count without having configs for all tasks

Fix 2: Handle compacted connector configs gracefully

  • Distinguish between truly deleted connector (no deferred updates, not previously applied) vs compacted config
  • Recover from previously applied config if available
  • Mark as inconsistent if recovery not possible (preserves KAFKA-16838 fix behavior)

Both fixes are needed to fully address NPE crashes when starting tasks after config topic compaction.

@github-actions github-actions bot added triage PRs from the community connect small Small PRs labels Jan 29, 2026
@ayalamark-afk ayalamark-afk force-pushed the KAFKA-17676-clean branch 3 times, most recently from 76cb8fe to e862341 Compare February 2, 2026 16:01
@github-actions github-actions bot removed the small Small PRs label Feb 2, 2026
…ecovery

This commit adds three fixes for the NPE/task failure issues caused when
task configs are lost due to connect-configs topic compaction:

Fix 1: Leader periodic check for inconsistent connectors (processInconsistentConnectors)
- Leader checks configState.inconsistentConnectors() on each tick
- Automatically triggers reconfigureConnectorTasksWithRetry() for any
  connector with incomplete task configs
- This proactively recovers connectors before tasks fail to start

Fix 2: Auto-recovery in startTask() when task config is missing
- Checks if taskConfig is null before attempting to start task
- If connector is running, calls requestTaskReconfiguration() to regenerate configs
- Throws ConnectException so task will be retried after configs are regenerated
- This provides fallback recovery if periodic check misses the issue

Fix 3: Proper cleanup in removeConnectorConfig()
- When connector is deleted, also tombstone all task configs and commit record
- Prevents orphaned task configs that could cause issues when connector is recreated
- Original code only tombstoned connector config and target state

These fixes ensure Kafka Connect can automatically recover from task config
loss due to topic compaction without manual intervention.
@github-actions github-actions bot added the small Small PRs label Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

connect small Small PRs triage PRs from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant