fix: restore a post-commit boundary in event processing#885
fix: restore a post-commit boundary in event processing#885diegomrsantos wants to merge 16 commits intosigp:unstablefrom
Conversation
|
The earlier review guide is outdated after the history rewrite. The easiest way to review this PR now is to follow the 4-commit stack in order and focus on a small set of functions. The core model is:
If any production path violates that ordering, it is suspicious. 1. Database-owned commit boundaryCommit:
Read these first:
What to check:
2. Per-event processing modelCommit:
Read these next:
Then read the handlers with the most important semantics:
What to check:
3. Sync semantics and hardeningCommit:
Read:
This commit also contains the benchmark-driven correctness fixes that made the branch actually match the intended semantics:
4. Tests and cleanupCommit:
Read this last. Highest-signal tests:
What to check:
Shortest useful pathIf you only want the highest-signal path through the diff, I would read these in order:
What to ignore on first passOn a first review, I would ignore:
The main review question for this PR is still: does every production event now cross one DB-owned boundary where the durable event change and exact sync progress commit together, with all in-memory publication strictly after commit? |
anchor/eth/src/event_processor.rs
Outdated
| }; | ||
| let fee_recipient = self | ||
| .db | ||
| .with_state(|state| state.fee_recipient_for_owner(&owner).unwrap_or(owner)); |
There was a problem hiding this comment.
ValidatorAdded does not need fee_recipient to persist the validator. The write path only needs cluster_id, owner, validator metadata, and shares; fee_recipient lives in the owners table and is joined later when materializing a full Cluster on reads. Reading it here was forcing a read-model concern into the write path and is what made the earlier design more confusing than necessary.
|
@claude review this PR |
This comment was marked as outdated.
This comment was marked as outdated.
|
@claude review this PR again |
|
Claude finished @diegomrsantos's task in 3m 1s —— View job Review: fix: restore a post-commit boundary in event processing (re-review)
Changes since last reviewFive commits landed since the previous review, directly addressing all three actionable items raised:
New test coverage: Invariant verificationRe-verified the core invariant — SQL + cursor commit together inside a transaction, in-memory
Every
IssuesNo new correctness issues found. The previous review's three actionable items have all been addressed with appropriate fixes and test coverage. One minor observation remains:
let metadata = state
.multi_state
.validator_metadata
.remove(validator_pubkey)
.expect("Data should have existed");This runs inside the Observations
SummaryThe previous review's actionable items have been addressed. The commit boundary invariant holds across all production paths. The defensive guards ( |
79ea6cd to
53c6cda
Compare
53c6cda to
e506639
Compare
| let operators_valid = self | ||
| .db | ||
| .with_state(|state| validate_operators(&operator_ids, &cluster_id, state)); | ||
| if let Err(err) = operators_valid { |
There was a problem hiding this comment.
@dknopik @shane-moore I reverted validate_operators back to ExecutionError::Database, but this call site is still where the ambiguity matters. If required operators are missing from committed state, we know the validator event is not safely applicable, but we do not know whether that comes from malformed history or local state inconsistency. This PR is not meant to redefine that boundary, so I kept the old classification and wanted to call out the ambiguity explicitly.
anchor/eth/src/event_processor.rs
Outdated
| "Failed to fetch validator metadata from database" | ||
| ); | ||
| return Err(ExecutionError::Database( | ||
| return Err(EventActionError::Fatal(ExecutionError::Database( |
There was a problem hiding this comment.
@dknopik @shane-moore I reverted these missing-metadata and missing-cluster branches back to DB-style failures. From this code we cannot prove the ValidatorRemoved event is invalid; all we know is that committed local state is missing the validator state the event expects. That could still be malformed history, but it could also be local inconsistency, so treating it as skippable InvalidEvent felt out of scope for this PR.
|
Closing in favor of the smaller replacement stack discussed on #880: The problem is valid, but this PR is too large for the boundary fix and it introduces event-level resume machinery that we do not want to take forward. The replacement path is a smaller series from |
Problem, Evidence, and Context
NetworkStatepublication, and downstream side effects across different layers.Change Overview
Risks, Trade-offs, and Mitigations
(block_number, transaction_index, log_index)informationValidation
cargo fmt --allcargo fmt --all --checkcargo clippy -p database -p eth --all-targets -- -D warningscargo test -p database -p eth --features database/test-utilsRollback
Blockers / Dependencies
Additional Info / Next Steps