-
Notifications
You must be signed in to change notification settings - Fork 48
Shadow tasks #1467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shadow tasks #1467
Conversation
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis PR updates documentation for Redpanda version 25.3 beta release. Changes include: bumping Antora version metadata from 25.2 to 25.3 with prerelease flag, adding beta branch to GitHub Actions workflow triggers, restructuring navigation to emphasize disaster recovery features, and significantly expanding Shadowing documentation (a new cross-region replication feature for disaster recovery). Additionally, Console Linux deployment documentation is expanded with OS package installation steps, Admin API documentation is reorganized to include ConnectRPC endpoints available in v25.3+, Schema Registry mode documentation is enhanced with READONLY/READWRITE/IMPORT modes, Iceberg documentation transitions from AWS Glue to GCP BigLake integration, and comprehensive rpk shadow command reference pages are added. Multiple pages are reorganized or moved between module hierarchies with updated aliases and cross-references. Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| The **Consumer Group Shadowing task** replicates consumer group offsets and membership information from the source cluster. This ensures that consumer applications can resume processing from the correct position after failover. | ||
|
|
||
| The task is controlled by the `consumer_offset_sync_options` configuration section, which includes: | ||
|
|
||
| * **Group filters**: Determines which consumer groups have their offsets replicated | ||
| * **Sync interval**: How frequently to synchronize consumer group offsets | ||
| * **Offset clamping**: Automatically adjusts replicated offsets to valid ranges on the shadow cluster | ||
|
|
||
| This task runs on brokers that host the `__consumer_offsets` topic and continuously tracks consumer group coordinators to optimize offset synchronization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is too in the weeds but it may be helpful to point out a couple of things:
- This only replicated committed offsets for active Shadow Topics. So if "group-a" contains "topic-1", "topic-2", and "topic-3", and "topic-1" is an active Shadow Topic and "topic-2" is a failed over Shadow Topic and "topic-3" isn't being Shadowed at all, then only committed offsets for "topic-1" in "group-a" will be replicated
- We clamp offsets upon commit on the Shadow Cluster - if the committed offset on the source cluster is above the HWM of the shadow partition, we clamp the offset to the HWM before we commit them to the Shadow Cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fine. Your example is great, so I'll steal that 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if that works
ab5a8c5
| * **ACL filters**: Determines which security policies replicate | ||
| * **Sync interval**: How frequently to synchronize security settings | ||
| * **Resource matching**: Controls which ACL resources (topics, groups, cluster) are included |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource matching is part of the ACL filters
| === ACL filtering | ||
|
|
||
| By default all ACLs are replicated. This is recommended in order to ensure that your shadow cluster has the same permissions as your source cluster. ACL filters should be used with care: | ||
| By default all ACLs are replicated by the <<security-migrator-task,Security Migrator task>>. This is recommended in order to ensure that your shadow cluster has the same permissions as your source cluster. ACL filters should be used with care: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not accurate. There are no ACL filters by default so no ACLs are synced OOTB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you're basing this off of the example config that rpk produces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought this was the case. Based on the DR use case we wanted all ACLs synced by default.
| === Consumer group filtering and behavior | ||
|
|
||
| Consumer group filters determine which consumer groups have their offsets replicated to the shadow cluster. By default, all consumer groups are replicated unless you specify filters. | ||
| Consumer group filters determine which consumer groups have their offsets replicated to the shadow cluster by the <<consumer-group-shadowing-task,Consumer Group Shadowing task>>. By default, all consumer groups are replicated unless you specify filters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you're referring to the example that rpk produces, by default nothing is replicated.
| === Starting offset for new shadow topics | ||
|
|
||
| When a shadow topic is created for the first time, you can control where replication begins on the source topic. This setting only applies to empty shadow partitions and is crucial for disaster recovery planning. | ||
| When the <<source-topic-sync-task,Source Topic Sync task>> creates a shadow topic for the first time, you can control where replication begins on the source topic. This setting only applies to empty shadow partitions and is crucial for disaster recovery planning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be worthwhile to mention that changing this config does not effect any current Shadow Topic but will effect any new Shadow Topics that get created.
Feediver1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm--some minor suggestions
| * **SRC_LSO**: Source partition Last Stable Offset | ||
| * **SRC_HWM**: Source partition High Watermark | ||
| * **DST_HWM**: Shadow (destination) partition High Watermark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * **SRC_LSO**: Source partition Last Stable Offset | |
| * **SRC_HWM**: Source partition High Watermark | |
| * **DST_HWM**: Shadow (destination) partition High Watermark | |
| * **SRC_LSO**: Source partition last stable offset | |
| * **SRC_HWM**: Source partition high watermark | |
| * **DST_HWM**: Shadow (destination) partition high watermark |
| |`redpanda_shadow_link_shadow_lag` | ||
| |Gauge | ||
| |The lag of the shadow partition against the source partition, calculated as source partition LSO minus shadow partition HWM. Monitor by `shadow_link_name`, `topic`, and `partition` to understand replication lag for each partition. | ||
| |The lag of the shadow partition against the source partition, calculated as source partition LSO (Last Stable Offset) minus shadow partition HWM (High Watermark). Monitor by `shadow_link_name`, `topic`, and `partition` to understand replication lag for each partition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| |The lag of the shadow partition against the source partition, calculated as source partition LSO (Last Stable Offset) minus shadow partition HWM (High Watermark). Monitor by `shadow_link_name`, `topic`, and `partition` to understand replication lag for each partition. | |
| |The lag of the shadow partition against the source partition, calculated as source partition LSO (last stable offset) minus shadow partition HWM (high watermark). To understand the replication lag for each partition, monitor `shadow_link_name`, `topic`, and `partition`. |
| * **Link unavailability**: When tasks show `LINK_UNAVAILABLE` indicating source cluster connectivity issues | ||
|
|
||
| For more information about shadow link tasks and their states, see xref:setup.adoc#shadow-link-tasks[]. | ||
| * **Throughput drops**: When bytes/records fetched drops significantly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * **Link unavailability**: When tasks show `LINK_UNAVAILABLE` indicating source cluster connectivity issues | |
| For more information about shadow link tasks and their states, see xref:setup.adoc#shadow-link-tasks[]. | |
| * **Throughput drops**: When bytes/records fetched drops significantly | |
| * **Link unavailability**: When tasks show `LINK_UNAVAILABLE` indicating source cluster connectivity issues | |
| + | |
| For more information about shadow link tasks and their states, see xref:setup.adoc#shadow-link-tasks[]. | |
| * **Throughput drops**: When bytes/records fetched drops significantly |
And please check this link: it needs the full path!
| * **Individual topic states**: Current state of each replicated topic (`ACTIVE`, `FAULTED`, `FAILING_OVER`, `FAILED_OVER`) | ||
| * **Task status**: Health of replication tasks across brokers (`ACTIVE`, `FAULTED`, `NOT_RUNNING`, `LINK_UNAVAILABLE`) | ||
| * **Lag information**: Replication lag per partition showing source vs shadow watermarks | ||
| * **Task status**: Health of replication tasks across brokers (`ACTIVE`, `FAULTED`, `NOT_RUNNING`, `LINK_UNAVAILABLE`). For details about shadow link tasks, see xref:setup.adoc#shadow-link-tasks[]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check this link: it needs the full path
|
|
||
| By default, all ACLs replicate to ensure your shadow cluster maintains the same security posture as your source cluster. | ||
|
|
||
| === Task Status and Monitoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| === Task Status and Monitoring | |
| === Task status and monitoring |
| 1. **Exclude filters win**: If any EXCLUDE filter matches a resource, it is excluded regardless of INCLUDE filters | ||
| 2. **Order matters for INCLUDE filters**: Among INCLUDE filters, the first match determines the result | ||
| 3. **Default behavior**: Items that don't match any filter are excluded from replication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. **Exclude filters win**: If any EXCLUDE filter matches a resource, it is excluded regardless of INCLUDE filters. | |
| 2. **Order matters for INCLUDE filters**: Among INCLUDE filters, the first match determines the result. | |
| 3. **Default behavior**: Items that don't match any filter are excluded from replication. |
| name: "test-consumer-group" # Exclude specific test groups | ||
| ---- | ||
|
|
||
| **Important consumer group considerations:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| **Important consumer group considerations** |
| ==== | ||
| This is an emergency procedure. For planned failover testing or day-to-day shadow link management, see xref:./failover.adoc[]. Ensure you have completed the disaster readiness checklist in xref:./overview.adoc#disaster-readiness-checklist[] before an emergency occurs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paulohtb6 This link doesn't render. Please change it to full path
| * Topics should be in `ACTIVE` state (not `FAULTED`). | ||
| * Replication lag should be reasonable for your RPO requirements. | ||
|
|
||
| **Understanding replication lag:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| **Understanding replication lag** |
|
|
||
| [IMPORTANT] | ||
| ==== | ||
| Note the replication lag to estimate potential data loss during failover. The `Tasks` section shows the health of shadow link replication tasks. For details about what each task does, see xref:setup.adoc#shadow-link-tasks[]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paulohtb6 this link isn't rendering
|
@paulohtb6: On the overview page, please make this limitations a list: https://deploy-preview-1467--redpanda-docs-preview.netlify.app/25.3/manage/disaster-recovery/shadowing/overview/#limitations |
|
Also in overview, please change description to: Learn how to use Shadowing with cross-region replication for disaster recovery. |
|
This intro should include Console (along with rpk and API) |
|
also in overview, please edit line to: Shadow link names, configuration details, and networking documented |
|
|
||
| [TIP] | ||
| ==== | ||
| Use xref:get-started:config-rpk-profile.adoc[`rpk profile`] to save your cluster connection details and credentials for both source and shadow clusters, allowing you to easily switch between the two configurations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Use xref:get-started:config-rpk-profile.adoc[`rpk profile`] to save your cluster connection details and credentials for both source and shadow clusters. This allows you to easily switch between the two configurations. |
| If you need to modify a shadow link configuration after creation, use the update command: | ||
|
|
||
| [,bash] | ||
| ---- | ||
| rpk shadow update <shadow-link-name> | ||
| ---- | ||
|
|
||
| For detailed command options, see xref:reference:rpk/rpk-shadow/rpk-shadow-update.adoc[`rpk shadow update`]. | ||
|
|
||
| This opens your default editor to modify the shadow link configuration. Only changed fields are updated on the server. The shadow link name cannot be changed - you must delete and recreate the link to rename it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| To modify a shadow link configuration after creation, run: | |
| [,bash] | |
| ---- | |
| rpk shadow update <shadow-link-name> | |
| ---- | |
| This opens your default editor to modify the shadow link configuration. Only changed fields are updated on the server. The shadow link name cannot be changed: you must delete and re-create the link to rename it. | |
| For detailed command options, see xref:reference:rpk/rpk-shadow/rpk-shadow-update.adoc[`rpk shadow update`].``` |
|
|
||
| Configure network connectivity between your source and shadow clusters to enable shadow link replication. The shadow cluster initiates connections to the source cluster using a pull-based architecture. | ||
|
|
||
| For additional details about networking, see <<network-and-authentication>>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| See also: <<network-and-authentication>> |
| The starting offset only affects **new shadow topics**. Once a shadow topic exists and has data, changing this setting has no effect on that topic's replication. | ||
| ==== | ||
|
|
||
| === Generate a sample configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paulohtb6 should this heading include "for rpk" (since they don't need to do this if they're configuring via Console, right)? Or else add a note that says that this config file is not required if you configure via Console?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, maybe at the top of this section, at Set filters, you could add a note explaining that rpk sets up these filters with a configuration file, but in Console you're guided through forms to set the configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and even better, explain this at the very top, in Set up Shadowing/Configure filters
| * **Configure filters**: Define which topics, consumer groups, and ACLs should replicate by creating include/exclude patterns that match your disaster recovery requirements. See <<set-filters>>. | ||
| * **Create a shadow link**: Establish the connection between clusters using `rpk`, the Admin API, or Redpanda Console with authentication and network settings. See <<create-a-shadow-link>>. | ||
|
|
||
| == Shadow Link Tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this Tasks sections come after the Filters? In console, they set filters & create the shadow link & then see the tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ Filter configuration should come before task explanation. You can set up Shadowing without knowing anything about the tasks, the same is not true for filters. Plus just above we have "Configure Filters"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also move the rpk command for generating a sample config up towards the top. It's a helpful guide for the rest of the configuration.
treevon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think we should reorder the configure shadowing page to have the rpk shadow config generate command up top and filters before tasks. Other issue, noted by Michelle, is that the task reference links are not rendering.
| * **Configure filters**: Define which topics, consumer groups, and ACLs should replicate by creating include/exclude patterns that match your disaster recovery requirements. See <<set-filters>>. | ||
| * **Create a shadow link**: Establish the connection between clusters using `rpk`, the Admin API, or Redpanda Console with authentication and network settings. See <<create-a-shadow-link>>. | ||
|
|
||
| == Shadow Link Tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ Filter configuration should come before task explanation. You can set up Shadowing without knowing anything about the tasks, the same is not true for filters. Plus just above we have "Configure Filters"
| * **Configure filters**: Define which topics, consumer groups, and ACLs should replicate by creating include/exclude patterns that match your disaster recovery requirements. See <<set-filters>>. | ||
| * **Create a shadow link**: Establish the connection between clusters using `rpk`, the Admin API, or Redpanda Console with authentication and network settings. See <<create-a-shadow-link>>. | ||
|
|
||
| == Shadow Link Tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also move the rpk command for generating a sample config up towards the top. It's a helpful guide for the rest of the configuration.
f7700bc to
96d8bf3
Compare
Co-authored-by: Joyce Fee <102751339+Feediver1@users.noreply.github.com>
ab5a8c5 to
1b4a73c
Compare
Co-authored-by: Joyce Fee <102751339+Feediver1@users.noreply.github.com>
Description
Explain what tasks are and their relationship in Shadowing.
Review deadline: 20th Nov
Page previews
Modified Documentation Pages:
Checks