Skip to content

Commit a409886

Browse files
paulohtb6Feediver1
andauthored
shadowing: add shadow tasks (#1467)
Co-authored-by: Joyce Fee <102751339+Feediver1@users.noreply.github.com>
1 parent 5d4dd65 commit a409886

File tree

13 files changed

+104
-17
lines changed

13 files changed

+104
-17
lines changed

modules/get-started/pages/release-notes/redpanda.adoc

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,21 @@ Redpanda v25.3 introduces xref:deploy:redpanda/manual/disaster-recovery/shadowin
2121

2222
The shadow cluster operates in read-only mode while continuously receiving updates from the source cluster. During a disaster, you can failover individual topics or an entire shadow link to make resources fully writable for production traffic. See xref:deploy:redpanda/manual/disaster-recovery/shadowing/failover-runbook.adoc[] for emergency procedures.
2323

24+
=== New rpk shadow commands
25+
26+
This release introduces new xref:reference:rpk/rpk-shadow/rpk-shadow.adoc[`rpk shadow`] commands for managing Redpanda Shadow Links:
27+
28+
* xref:reference:rpk/rpk-shadow/rpk-shadow-config-generate.adoc[`rpk shadow config generate`] - Generate configuration files for shadow links
29+
* xref:reference:rpk/rpk-shadow/rpk-shadow-create.adoc[`rpk shadow create`] - Create new shadow links
30+
* xref:reference:rpk/rpk-shadow/rpk-shadow-update.adoc[`rpk shadow update`] - Update existing shadow link configurations
31+
* xref:reference:rpk/rpk-shadow/rpk-shadow-list.adoc[`rpk shadow list`] - List all shadow links
32+
* xref:reference:rpk/rpk-shadow/rpk-shadow-describe.adoc[`rpk shadow describe`] - View shadow link configuration details
33+
* xref:reference:rpk/rpk-shadow/rpk-shadow-status.adoc[`rpk shadow status`] - Monitor shadow link replication status
34+
* xref:reference:rpk/rpk-shadow/rpk-shadow-failover.adoc[`rpk shadow failover`] - Perform emergency failover operations
35+
* xref:reference:rpk/rpk-shadow/rpk-shadow-delete.adoc[`rpk shadow delete`] - Delete shadow links
36+
37+
These commands provide complete command-line management of your disaster recovery infrastructure.
38+
2439
== Connected client monitoring
2540

2641
You can view details about Kafka client connections using `rpk` or the Admin API ListKafkaConnections endpoint. This allows you to view detailed information about active client connections on a cluster, and identify and troubleshoot problematic clients. For more information, see the xref:manage:cluster-maintenance/manage-throughput.adoc#view-connected-client-details[connected client details] example in the Manage Throughput guide.

modules/manage/pages/disaster-recovery/shadowing/failover-runbook.adoc

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,16 @@ Name: <topic-name>, State: ACTIVE
128128
1 2345 2579 2568 11
129129
----
130130

131-
IMPORTANT: Note the replication lag to estimate potential data loss during failover.
131+
The partition information shows:
132+
* **SRC_LSO**: Source partition Last Stable Offset
133+
* **SRC_HWM**: Source partition High Watermark
134+
* **DST_HWM**: Shadow (destination) partition High Watermark
135+
* **Lag**: Message count difference between source and shadow partitions
136+
137+
[IMPORTANT]
138+
====
139+
Note the replication lag to estimate potential data loss during failover. The `Tasks` section shows the health of shadow link replication tasks. For details about what each task does, see xref:setup.adoc#shadow-link-tasks[].
140+
====
132141

133142
[[initiate-failover]]
134143
=== Initiate failover

modules/manage/pages/disaster-recovery/shadowing/failover.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,8 @@ The output shows individual topic states and any issues encountered during the f
9797
* **`NOT_RUNNING`**: Task is not currently executing
9898
* **`LINK_UNAVAILABLE`**: Task cannot communicate with the source cluster
9999

100+
For detailed information about shadow link tasks and their roles, see xref:setup.adoc#shadow-link-tasks[].
101+
100102

101103
== Post-failover cluster behavior
102104

modules/manage/pages/disaster-recovery/shadowing/monitor.adoc

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,8 @@ The status output includes:
5252

5353
* **Shadow link state**: Overall operational state (`ACTIVE`)
5454
* **Individual topic states**: Current state of each replicated topic (`ACTIVE`, `FAULTED`, `FAILING_OVER`, `FAILED_OVER`)
55-
* **Task status**: Health of replication tasks across brokers (`ACTIVE`, `FAULTED`, `NOT_RUNNING`, `LINK_UNAVAILABLE`)
56-
* **Lag information**: Replication lag per partition showing source vs shadow watermarks
55+
* **Task status**: Health of replication tasks across brokers (`ACTIVE`, `FAULTED`, `NOT_RUNNING`, `LINK_UNAVAILABLE`). For details about shadow link tasks, see xref:setup.adoc#shadow-link-tasks[].
56+
* **Lag information**: Replication lag per partition showing source vs shadow high watermarks (HWM)
5757

5858
[[shadow-link-metrics]]
5959
== Metrics
@@ -66,7 +66,7 @@ Shadowing provides comprehensive metrics to track replication performance and he
6666

6767
|`redpanda_shadow_link_shadow_lag`
6868
|Gauge
69-
|The lag of the shadow partition against the source partition, calculated as source partition LSO minus shadow partition HWM. Monitor by `shadow_link_name`, `topic`, and `partition` to understand replication lag for each partition.
69+
|The lag of the shadow partition against the source partition, calculated as source partition LSO (Last Stable Offset) minus shadow partition HWM (High Watermark). Monitor by `shadow_link_name`, `topic`, and `partition` to understand replication lag for each partition.
7070

7171
|`redpanda_shadow_link_total_bytes_fetched`
7272
|Count
@@ -119,4 +119,6 @@ Configure monitoring alerts for:
119119
* **Topic state changes**: When topics move to `FAULTED` state
120120
* **Task failures**: When replication tasks enter `FAULTED` or `NOT_RUNNING` states
121121
* **Link unavailability**: When tasks show `LINK_UNAVAILABLE` indicating source cluster connectivity issues
122+
123+
For more information about shadow link tasks and their states, see xref:setup.adoc#shadow-link-tasks[].
122124
* **Throughput drops**: When bytes/records fetched drops significantly

modules/manage/pages/disaster-recovery/shadowing/setup.adoc

Lines changed: 64 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,63 @@ To set up Shadowing:
7272
* **Configure filters**: Define which topics, consumer groups, and ACLs should replicate by creating include/exclude patterns that match your disaster recovery requirements. See <<set-filters>>.
7373
* **Create a shadow link**: Establish the connection between clusters using `rpk`, the Admin API, or Redpanda Console with authentication and network settings. See <<create-a-shadow-link>>.
7474

75+
== Shadow Link Tasks
76+
77+
Shadow linking operates through specialized tasks that handle different aspects of replication. Each task corresponds to a configuration section in your shadow link setup and runs continuously to maintain synchronization with the source cluster.
78+
79+
[#source-topic-sync-task]
80+
=== Source Topic Sync Task
81+
82+
The **Source Topic Sync Task** manages topic discovery and metadata synchronization. This task periodically queries the source cluster to discover available topics, applies your configured topic filters to determine which topics should become shadow topics, and synchronizes topic properties between clusters.
83+
84+
The task is controlled by the `topic_metadata_sync_options` configuration section, which includes:
85+
86+
* **Auto-creation filters**: Determines which source topics automatically become shadow topics
87+
* **Property synchronization**: Controls which topic properties replicate from source to shadow
88+
* **Starting offset**: Sets where new shadow topics begin replication (earliest, latest, or timestamp-based)
89+
* **Sync interval**: How frequently to check for new topics and property changes
90+
91+
When this task discovers a new topic that matches your filters, it creates the corresponding shadow topic and begins replication from your configured starting offset.
92+
93+
[#consumer-group-shadowing-task]
94+
=== Consumer Group Shadowing Task
95+
96+
The **Consumer Group Shadowing task** replicates consumer group offsets and membership information from the source cluster. This ensures that consumer applications can resume processing from the correct position after failover.
97+
98+
The task is controlled by the `consumer_offset_sync_options` configuration section, which includes:
99+
100+
* **Group filters**: Determines which consumer groups have their offsets replicated
101+
* **Sync interval**: How frequently to synchronize consumer group offsets
102+
* **Offset clamping**: Automatically adjusts replicated offsets to valid ranges on the shadow cluster
103+
104+
This task runs on brokers that host the `__consumer_offsets` topic and continuously tracks consumer group coordinators to optimize offset synchronization.
105+
106+
[#security-migrator-task]
107+
=== Security Migrator Task
108+
109+
The **Security Migrator task** replicates security policies, primarily ACLs (Access Control Lists), from the source cluster to maintain consistent authorization across both environments.
110+
111+
The task is controlled by the `security_sync_options` configuration section, which includes:
112+
113+
* **ACL filters**: Determines which security policies replicate
114+
* **Sync interval**: How frequently to synchronize security settings
115+
116+
By default, all ACLs replicate to ensure your shadow cluster maintains the same security posture as your source cluster.
117+
118+
=== Task Status and Monitoring
119+
120+
Each task reports its status through the shadow link status API. Task states include:
121+
122+
* **`ACTIVE`**: Task is running normally and performing synchronization
123+
* **`PAUSED`**: Task has been manually paused through configuration
124+
* **`FAULTED`**: Task encountered an error and requires attention
125+
* **`NOT_RUNNING`**: Task is not currently executing
126+
* **`LINK_UNAVAILABLE`**: Task cannot communicate with the source cluster
127+
128+
You can pause individual tasks by setting the `paused` field to `true` in the corresponding configuration section. This allows you to selectively disable parts of the replication process without affecting the entire shadow link.
129+
130+
For monitoring task health and troubleshooting task issues, see xref:disaster-recovery:shadowing:monitor.adoc[Monitor Shadow Links].
131+
75132
== What gets replicated
76133

77134
Shadowing replicates your topic data with complete fidelity, preserving all message records with their original offsets, timestamps, headers, and metadata. The partition structure remains identical between source and shadow clusters, ensuring applications can resume processing from the exact same position after failover.
@@ -82,7 +139,7 @@ Partition count is always replicated to ensure the shadow topic matches the sour
82139

83140
=== Topic properties replication
84141

85-
For topic properties, Redpanda follows these replication rules:
142+
The <<source-topic-sync-task,Source Topic Sync task>> handles topic property replication. For topic properties, Redpanda follows these replication rules:
86143

87144
**Never replicated:**
88145

@@ -210,7 +267,7 @@ Redpanda system topics have the following specific filtering restrictions:
210267

211268
=== ACL filtering
212269

213-
By default all ACLs are replicated. This is recommended in order to ensure that your shadow cluster has the same permissions as your source cluster. ACL filters should be used with care:
270+
ACLs are replicated by the <<security-migrator-task,Security Migrator task>>. This is recommended to ensure that your shadow cluster has the same permissions as your source cluster. To configure ACL filters:
214271

215272
[,yaml]
216273
----
@@ -239,7 +296,9 @@ acl_filters:
239296

240297
=== Consumer group filtering and behavior
241298

242-
Consumer group filters determine which consumer groups have their offsets replicated to the shadow cluster. By default, all consumer groups are replicated unless you specify filters.
299+
Consumer group filters determine which consumer groups have their offsets replicated to the shadow cluster by the <<consumer-group-shadowing-task,Consumer Group Shadowing task>>.
300+
301+
Offset replication operates selectively within each consumer group. Only committed offsets for active shadow topics are synchronized, even if the consumer group has offsets for additional topics that aren't being shadowed. For example, if consumer group "app-consumers" has committed offsets for "orders", "payments", and "inventory" topics, but only "orders" is an active shadow topic, then only the "orders" offsets will be replicated to the shadow cluster.
243302

244303
[,yaml]
245304
----
@@ -259,11 +318,11 @@ consumer_offset_sync_options:
259318

260319
**Avoid name conflicts:** If you plan to consume data from the shadow cluster, do not use the same consumer group names as those used on the source cluster. While this won't break shadow linking, it can impact your RPO/RTO because conflicting group names may interfere with offset replication and consumer resumption during disaster recovery.
261320

262-
**Offset clamping:** When Redpanda replicates consumer group offsets from the source cluster, offsets are automatically "clamped" during the commit process. If a replicated offset is above the high watermark (HWM) of the shadow partition, Redpanda clamps the offset to the shadow partition's HWM. This ensures offsets remain valid and prevents consumers from seeking beyond available data on the shadow cluster.
321+
**Offset clamping:** When Redpanda replicates consumer group offsets from the source cluster, offsets are automatically "clamped" during the commit process on the shadow cluster. If a committed offset from the source cluster is above the high watermark (HWM) of the corresponding shadow partition, Redpanda clamps the offset to the shadow partition's HWM before committing it to the shadow cluster. This ensures offsets remain valid and prevents consumers from seeking beyond available data on the shadow cluster.
263322

264323
=== Starting offset for new shadow topics
265324

266-
When a shadow topic is created for the first time, you can control where replication begins on the source topic. This setting only applies to empty shadow partitions and is crucial for disaster recovery planning.
325+
When the <<source-topic-sync-task,Source Topic Sync task>> creates a shadow topic for the first time, you can control where replication begins on the source topic. This setting only applies to empty shadow partitions and is crucial for disaster recovery planning. Changing this configuration only affects new shadow topics, existing shadow topics continue replicating from their current position.
267326

268327
[,yaml]
269328
----

modules/reference/pages/rpk/rpk-shadow/rpk-shadow-config-generate.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,4 +68,4 @@ rpk shadow config generate --print-template -o shadow-link.yaml
6868
|-v, --verbose |- |Enable verbose logging.
6969
|===
7070

71-
// end::single-source[]
71+
// end::single-source[]

modules/reference/pages/rpk/rpk-shadow/rpk-shadow-create.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,4 +53,4 @@ rpk shadow create -c shadow-link.yaml --no-confirm
5353
|-v, --verbose |- |Enable verbose logging.
5454
|===
5555

56-
// end::single-source[]
56+
// end::single-source[]

modules/reference/pages/rpk/rpk-shadow/rpk-shadow-delete.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,4 +60,4 @@ rpk shadow delete my-shadow-link --force
6060
|-v, --verbose |- |Enable verbose logging.
6161
|===
6262

63-
// end::single-source[]
63+
// end::single-source[]

modules/reference/pages/rpk/rpk-shadow/rpk-shadow-describe.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,4 +73,4 @@ rpk shadow describe my-shadow-link -c
7373
|-v, --verbose |- |Enable verbose logging.
7474
|===
7575

76-
// end::single-source[]
76+
// end::single-source[]

modules/reference/pages/rpk/rpk-shadow/rpk-shadow-failover.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,4 +64,4 @@ rpk shadow failover my-shadow-link --all --no-confirm
6464
|-v, --verbose |- |Enable verbose logging.
6565
|===
6666

67-
// end::single-source[]
67+
// end::single-source[]

0 commit comments

Comments
 (0)