Handle delete CDC events with _planetscale_operation field#150
Open
lahdirakram wants to merge 1 commit intoplanetscale:mainfrom
Open
Handle delete CDC events with _planetscale_operation field#150lahdirakram wants to merge 1 commit intoplanetscale:mainfrom
lahdirakram wants to merge 1 commit intoplanetscale:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds explicit handling for delete CDC events in the PlanetScale Airbyte source.
Today, the connector only emits row changes when the VStream row change contains an
afterimage. In practice, that means delete events are silently dropped because Vitess encodes deletes asbefore != nil && after == nil.This change makes delete handling explicit and surfaces the operation type on every emitted record through a top-level
_planetscale_operationfield.Problem
The current behavior loses information for CDC consumers:
For downstream systems, this means the connector cannot faithfully represent table state over time. Any consumer that expects CDC semantics will drift, because rows removed in PlanetScale never produce a corresponding event.
Proposed solution
This PR introduces two related changes:
Add a top-level
_planetscale_operationfield to emitted records with values:insertupdatedeleteAdd
capture_deletesas a source config flag to control whether delete events are emitted.Behavior:
afterimage with_planetscale_operation = "insert"afterimage with_planetscale_operation = "update"beforeimage with_planetscale_operation = "delete"whencapture_deletes=trueWhy a top-level field instead of
_planetscale_metadata.operationI originally considered putting the operation inside
_planetscale_metadata, but I think that creates the wrong coupling.Reasons:
AirbyteRecordMessageinclude_metadatamakes delete capture harder to use than necessaryThis keeps the design cleaner:
_planetscale_operationcommunicates row semantics_planetscale_metadataremains optional transport/replication metadataWhy this is worth adding
This change improves correctness more than convenience.
Without delete emission, the connector is not producing a complete CDC stream. With this PR:
capture_deletesBackward compatibility
This is designed to minimize disruption:
capture_deletes_planetscale_metadataremains optional_planetscale_operationTesting
This PR adds coverage for:
capture_deletes=truecapture_deletes=falseinsertNotes
I also added a targeted log message for captured delete rows to make runtime verification easier while validating this behavior.
If maintainers prefer a different field name or want delete capture enabled by default, I can adjust that, but I think the important part is to stop silently dropping delete events and provide an explicit CDC contract.