Conversation
There was a problem hiding this comment.
Pull request overview
Adds a feature design document describing a proposed “data migrations” system for protocol classification, backfill processing, and API exposure in the wallet-backend.
Changes:
- Introduces a detailed design doc covering proposed schema, workflows (setup/live/backfill), and cursor tracking.
- Documents contract classification via WASM inspection and a
known_wasmscache approach. - Describes planned API surface changes for history enrichment and current-state gating by migration status.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| -- 'failed' - Migration failed | ||
| ``` | ||
|
|
||
| **Migration Cursor Tracking** (via `ingest_store` table): |
There was a problem hiding this comment.
💡 rather than requiring end-ledger for protocol-migrate we could also use ingest_store key/val pairs to store the first ledger live ingestion began applying the newly supported processors.
There was a problem hiding this comment.
yeah it's true, this was actually part of the previous design and it seemed like the consensus was for a more implicit approach where the operator controls the ranges. I do think we could have live ingestion write this value per protocol and the migration table could clean up the ingest store before exiting.
If we do this, the backfill migration should check the ingest store to ensure that live ingestion set an end-ledger for the protocol(s) before the migration starts, and fail early if not.
There was a problem hiding this comment.
@aditya1702 what's your input here, I believe you were a fan of the implicit approach but if we use the ingest store pattern we can be more explicit and clean up the row after the migration.
| │ (holds out-of-order results │ | ||
| │ until ready to commit) │ |
There was a problem hiding this comment.
We need to be careful not to get stuck here. Meaning, if the buffer is waiting to receive results N and holds results N+1, we need to be sure N is coming, or we abort the whole process.
There was a problem hiding this comment.
Yeah this is a good point. The commit stages depend on each other so the system should have some threshold of time before it tries to either re-process a batch or exit entirely. I can add more detail around this.
| type OperationProtocol { | ||
| protocol: Protocol! | ||
| contractId: String! | ||
| } |
There was a problem hiding this comment.
Why have contractId here?
There was a problem hiding this comment.
If the client wants to know which OperationProtocol is the root invocation, it can use this field to match the contract ID that was invoked in the operation. This can be useful for displaying titles or a hierarchy for the call stack with richer details.
…_status to differentiate between not started and in progress migrations
…steps in the ContractData branch, removes Balance branch
…istinction between uploads and upgrades/deployments
…col-setup in the "When Checkpoint Classification Runs" section
…dow, in order to discard state changes outside of retention. 1. Schema changes: enabled field removed, display_name removed, status default is not_started 2. Status values: All updated to new naming scheme (not_started, classification_in_progress, classification_success, backfilling_in_progress, backfilling_success, failed) 3. protocol-setup: Now uses --protocol-id flag (opt-in), updated command examples and workflow 4. Classification section (line 125): Updated to describe ContractCode validation and ContractData lookup 5. Checkpoint population diagram: Removed Balance branch, updated to show WASM hash storage in known_wasms 6. Live ingestion classification diagram: Separated into ContractCode and ContractData paths with RPC fallback 7. Live State Production diagram: Updated classification box to mention ContractCode uploads and ContractData Instance changes 8. Backfill migration: Added retention-aware processing throughout (flow diagram, workflow diagram, parallel processing) 9. Parallel backfill worker pool: Added steps for retention window filtering
… relationship between classification and state production
…igration status in the API for protocols
|
@JakeUrban @aditya1702 after spending more time thinking about this, I realize there is an under-explored part of this design. Current state production will be specific to the protocol that produces the state, but it should fall into two categories: additive state changes and non-additive state changes. The way you track current state from the migration to the live ingestion state production will depend on the type of state produced. For example - Non Additive State Changes: Additive State Changes: Possible solutions - Option B seems like the most complete solution to me, but option A is simpler. It may be hard to assume that all protocols that produce additive state will have an interface to access the dependent state but this seems true for the few that I've considered(SEP-41, SEP-56). I propose we go with option B. |
|
@aristidesstaffieri boiling the problem down, is this an accurate description? Live ingestion may not be able to update current state without knowing the previous state. For example, if live ingestion observes a transfer of 5, it needs to know the previous balances of the sender and receiver in order to know the balances of both after the transfer. The problem is that live ingestion may not be able to get the previous state until That problem statement makes sense to me, but I don't understand the solutions you're proposing. Option A: I don't think we can assume protocols expose an interface for answering historical state queries like "what was my balance at ledger N". Option B: I think its safe to assume that our historical state changes will have enough information to derive current state -- we can design our historical state changes schema with that as a requirement. But how does step 2 work exactly? Option C: I don't think live ingestion can write current state anywhere, because it won't know it, as explained in the problem statement. |
ok after some offline discussion, here is the proposed solution to this problem - We will remove the The live ingestion process will now do the following(per protocol): If the current state for the protocol has been written up to the last ledger before this one, produce new state changes and current state. The live ingestion process will keep an in memory map of Proposed steps to change: protocol_{ID}_current_state_cursor = last ledger for which current state was written Both processes use this cursor. The key rule:
The CAS is a conditional update: The critical property: migration processes ALL ledgers including the overlap with live ingestion. It doesn't stop at live ingestion's start ledger. Both processes independently process Timeline: Current state was produced for every single ledger. No gap. What If Live Ingestion Checks Before Migration Commits? T=0s: Cursor=10004. Migration starts processing 10005. Still no gap. Migration filled in ledger 10008's current state because it processes everything. The "race" just determines which process writes current state for a given ledger, but one |
Closes #468
What
Adds a design document for the data migrations feature.
Why
Document the feature for implementation and receive feedback from stake holders.
Known limitations
N/A
Issue that this PR addresses
#468
Checklist
PR Structure
allif the changes are broad or impact many packages.Thoroughness
Release