fix: restore a post-commit boundary in event processing by diegomrsantos · Pull Request #885 · sigp/anchor

diegomrsantos · 2026-03-12T19:13:34Z

Problem, Evidence, and Context

Addresses Rework event processing around a single store boundary #880.
Related discussions: feat: restore in-memory state from disk on internal errors #735, fix: offload index sync store to blocking task #841.
Event processing currently mixes durable DB writes, progress tracking, in-memory NetworkState publication, and downstream side effects across different layers.
That makes it possible to reason about tentative state as if it were committed state, and it leaves sync progress too coarse to safely resume in the middle of a block.
This is worth doing now because the current ownership model has already produced multiple correctness and maintainability issues, and local mitigations were diverging instead of fixing the boundary.

Change Overview

Keep RPC log fetching batched, but process and commit logs one event at a time.
Add an exact processed-event cursor to metadata so sync can safely resume within a block.
Move the durable-write boundary into the database layer so persisted state, processed-event progress, and in-memory publication advance in the right order.
Update sync to resume from the finer-grained cursor and skip already committed logs from the same block.
Keep slashing protection registration as the one pre-commit safety precondition for validator activation.
Intentionally did not change event decoding, contract-facing semantics, or the existing test-only transaction helpers outside the production event/sync path.

Risks, Trade-offs, and Mitigations

Main trade-off: more SQLite transactions during sync, in exchange for a much simpler and safer commit/publication model.
Main data-plane risk: schema version increases to v4.
Main runtime risk: sync resume semantics change from block-level to event-level.
Mitigations:
- log fetching remains batched for RPC efficiency
- progress is now persisted with exact (block_number, transaction_index, log_index) information
- same-block resume behavior is covered by targeted tests
- the production event/sync path now publishes in-memory state only after commit

Validation

cargo fmt --all
cargo fmt --all --check
cargo clippy -p database -p eth --all-targets -- -D warnings
cargo test -p database -p eth --features database/test-utils
Added coverage for cursor persistence across restart and resuming within the same block without replaying already committed logs.

Rollback

Safe code rollback before merge: revert this PR.
Operational caveat: this migrates the DB schema to v4. Rolling back to an older binary that does not understand schema v4 is not safe on an already-upgraded DB.
If a rollback is needed after deploying a build with this change, use a compatible binary or restore the DB from a pre-migration backup.

Blockers / Dependencies

N/A

Additional Info / Next Steps

The production path now uses the new boundary, but some legacy transaction-taking database helpers still exist for tests and non-production utility paths. They can be cleaned up separately once this behavior change lands.

diegomrsantos · 2026-03-12T19:22:59Z

The earlier review guide is outdated after the history rewrite. The easiest way to review this PR now is to follow the 4-commit stack in order and focus on a small set of functions.

The core model is:

EventProcessor decides what one log means.
NetworkDatabase makes that event durable and persists the exact progress cursor in the same transaction.
Only after commit do we update NetworkState and notify watchers.
Once a whole fetched range succeeds, we collapse partial per-event progress back to a fully processed block.

If any production path violates that ordering, it is suspicious.

1. Database-owned commit boundary

Commit:

d4a6e60a9 refactor(database): add post-commit event progress boundary

Read these first:

anchor/database/src/lib.rs
- mark_event_processed: persists only the exact (block, tx, log) cursor for a committed or intentionally skipped event
- advance_processed_block: persists the coarse “whole block completed” boundary and clears the finer cursor
- commit_db_update: the core boundary of the PR; applies the SQL change, persists matching progress in the same transaction, commits, and only then publishes the corresponding in-memory state update
- apply_progress_to_tx / apply_progress_to_state: mirror the same progress model in SQLite and NetworkState
anchor/database/src/state.rs
- get_last_processed_event_from_db: treats the three cursor columns as one logical value and rejects mixed NULL/non-NULL rows as corruption
- next_block_to_fetch: defines resume semantics by preferring the partial cursor block when present so the same block can be re-fetched and already-committed logs skipped deterministically
anchor/database/src/operator_operations.rs
- commit_operator_added: commits operator insert + max_operator_id_seen + exact cursor together
- commit_seen_operator_id: commits only “we have seen operator id N” plus the exact cursor for malformed/skipped operator events
- commit_operator_removed: commits operator removal plus exact cursor
anchor/database/src/cluster_operations.rs
- commit_validator_added: commits nonce bump + validator/cluster/share insert + exact cursor together
- commit_owner_nonce: commits nonce bump + exact cursor for malformed/skipped ValidatorAdded
- commit_cluster_status: commits liquidation/reactivation status plus exact cursor
- commit_validator_removed: commits validator removal plus exact cursor
anchor/database/src/validator_operations.rs
- commit_fee_recipient_updated: commits fee-recipient update plus exact cursor
- set_validator_indices: commits index updates and then mirrors them into state
- update_graffiti: same post-commit boundary, but without sync-progress changes

What to check:

no shared state mutation before tx.commit()
durable event change and durable progress always commit together
NetworkState is only updated after commit
processed block progress is monotonic and clears any partial cursor
production writes go through the DB-owned commit boundary

2. Per-event processing model

Commit:

ba5fffa2d refactor(eth): process logs with per-event commits

Read these next:

anchor/eth/src/event_processor.rs
- process_logs: skips already-committed logs before processing the fetched range
- process_logs_inner: runs the range sequentially and only advances the full block boundary if the whole range succeeded
- process_single_log: derives the exact cursor for one log and routes it
- dispatch_log: only chooses the handler
- finish_processed_log: the key function where cursor ownership becomes explicit:
  - handler already committed cursor
  - handler succeeded and caller must advance cursor
  - handler was skippable and caller advances cursor
  - handler was fatal
- mark_event_processed: persists cursor-only progress when the caller owns advancement
- advance_processed_block_if_needed: collapses partial in-block progress back to a full block boundary only when appropriate
- skip_processed_logs / log_at_or_before_cursor / cursor_for_log: the replay/resume mechanics that make “re-fetch the same block, skip already-committed logs, continue” actually work
anchor/eth/src/util.rs
- validate_operators: validates operator-set structure and treats missing operators as invalid event data rather than internal DB failure

Then read the handlers with the most important semantics:

process_operator_added: duplicate operator ids are skippable, malformed/duplicate operator payloads can still persist max_operator_id_seen, and later valid operator ids should not be blocked by a bad earlier one
process_validator_added: malformed validator-add events still consume the owner nonce, duplicate validator-add events are skipped without letting the SQL unique constraint abort sync, slashing registration happens before the main DB commit and explicitly relies on idempotence, and fee recipient is no longer read before the write
process_validator_removed: missing or mismatched validator state is treated as invalid event data rather than a local DB failure
process_validator_exited: intentionally different from the others, because this is mostly a side-effect event, so it returns NeedsCursorAdvance and lets the caller mark progress afterwards

What to check:

there is no long-lived outer DB transaction anymore
one log is now the meaningful unit of progress
cursor ownership is explicit for every handler outcome
block progress is only advanced after the whole fetched range succeeds

3. Sync semantics and hardening

Commit:

ffad62c65 fix(eth): harden per-event sync semantics

Read:

anchor/eth/src/sync.rs
- historical_sync: fetching is still overlapped, but processing is serialized behind one running_processor; each batch is processed in spawn_blocking(move || process_logs(...)), and after a round it re-reads the committed resume point from DB instead of assuming end_block + 1
- live_sync: computes start_block from the same next_block_to_fetch model, skips already-synced blocks correctly on reconnect/reorg, and immediately awaits the blocking event-processing work so there is backpressure
anchor/eth/src/event_processor.rs
- advance_processed_block_if_needed: refuses to regress the processed block boundary and only collapses to block progress when the whole range actually succeeded

This commit also contains the benchmark-driven correctness fixes that made the branch actually match the intended semantics:

duplicate ValidatorAdded
OperatorAdded malformed/duplicate handling preserving max_operator_id_seen
re-reading committed sync progress instead of assuming the outer loop state

4. Tests and cleanup

Commit:

e5066390f refactor(database): align tests with commit boundary

Read this last.

Highest-signal tests:

anchor/database/src/tests/state_tests.rs
- test_processed_event_cursor_after_restart: proves exact event cursor persists across restart and is cleared once the full block boundary is committed
anchor/eth/tests/integration.rs
- duplicate validator-added test: proves duplicate ValidatorAdded is skipped without re-queueing index sync and while still advancing nonce/progress
- malformed/duplicate operator-added tests: prove bad operator history no longer blocks later valid operator ids and still preserves max_operator_id_seen
- multi-event processing test: proves multiple events in one fetched range still produce the expected final block progress

What to check:

restart preserves the exact event cursor
replay in the same block skips already-committed logs
malformed duplicate history no longer blocks later valid events
tests now use the same committed API shape as production instead of older tx-only helpers

Shortest useful path

If you only want the highest-signal path through the diff, I would read these in order:

anchor/database/src/lib.rs
anchor/database/src/state.rs
anchor/database/src/operator_operations.rs
anchor/database/src/cluster_operations.rs
anchor/database/src/validator_operations.rs
anchor/eth/src/event_processor.rs
anchor/eth/src/sync.rs
anchor/database/src/tests/state_tests.rs
anchor/eth/tests/integration.rs

What to ignore on first pass

On a first review, I would ignore:

most helper/test cleanup details
small parsing helper refactors
doc-only changes

The main review question for this PR is still:

does every production event now cross one DB-owned boundary where the durable event change and exact sync progress commit together, with all in-memory publication strictly after commit?

diegomrsantos · 2026-03-16T16:35:37Z

anchor/eth/src/event_processor.rs

-        };
+        let fee_recipient = self
+            .db
+            .with_state(|state| state.fee_recipient_for_owner(&owner).unwrap_or(owner));


ValidatorAdded does not need fee_recipient to persist the validator. The write path only needs cluster_id, owner, validator metadata, and shares; fee_recipient lives in the owners table and is joined later when materializing a full Cluster on reads. Reading it here was forcing a read-model concern into the write path and is what made the earlier design more confusing than necessary.

diegomrsantos · 2026-03-16T22:33:16Z

@claude review this PR

diegomrsantos · 2026-03-16T23:47:48Z

@claude review this PR again

claude-code-actions-sigp · 2026-03-16T23:48:09Z

Claude finished @diegomrsantos's task in 3m 1s —— View job

Review: fix: restore a post-commit boundary in event processing (re-review)

Read PR diff against origin/unstable
Review database core: lib.rs, schema.rs, state.rs, sql_operations.rs
Review database operations: cluster_operations.rs, operator_operations.rs, validator_operations.rs
Review event processor: event_processor.rs
Review sync changes: sync.rs, index_sync.rs, util.rs
Review test files
Verify commit boundary invariant across all paths
Post findings

Changes since last review

Five commits landed since the previous review, directly addressing all three actionable items raised:

e340195 + apply_progress_to_state in lib.rs:398-401, 405-408 — Added debug_assert! in both ProgressUpdate::Event and ProgressUpdate::Block arms, plus a debug_assert! in next_block_to_fetch (state.rs:419-422). This addresses the cursor ordering invariant concern.
a8e6d5f event_processor.rs:263-273 — advance_processed_block_if_needed now reads current_block and explicitly returns early if end_block < current_block. This addresses the block regression concern.
79ea6cd slashing.rs:26-30 + event_processor.rs:604-608 — Documented the idempotence requirement for register_validator in both the trait contract and the call site.
853a116 sync.rs:505-511 — Historical sync outer loop now re-reads committed progress via next_block_to_fetch() instead of assuming end_block + 1.
56dd8ab — Fixed test helper assertions to match the new API.

New test coverage: test_older_end_block_does_not_regress_progress proves the regression guard works.

Invariant verification

Re-verified the core invariant — SQL + cursor commit together inside a transaction, in-memory watch<NetworkState> updates only after commit, side effects after that — across all production paths.

commit_db_update (lib.rs:342-368) enforces the ordering:

apply_tx(&tx)  →  apply_progress_to_tx(progress, &tx)  →  tx.commit()  →  apply_state + apply_progress_to_state

Every commit_* wrapper flows through this single boundary. No production path mutates watched state before tx.commit().

Handler	Commits via	Cursor ownership	In-memory after commit
`process_operator_added`	`commit_operator_added` / `commit_seen_operator_id`	`CursorCommitted`	Yes
`process_operator_removed`	`commit_operator_removed`	`CursorCommitted`	Yes
`process_validator_added`	`commit_validator_added` / `commit_owner_nonce`	`CursorCommitted`	Yes
`process_validator_removed`	`commit_validator_removed`	`CursorCommitted`	Yes
`process_cluster_liquidated`	`commit_cluster_status`	`CursorCommitted`	Yes
`process_cluster_reactivated`	`commit_cluster_status`	`CursorCommitted`	Yes
`process_fee_recipient_updated`	`commit_fee_recipient_updated`	`CursorCommitted`	Yes
`process_validator_exited`	caller via `NeedsCursorAdvance`	`mark_event_processed`	Correct (side-effect only)
unknown/missing topic	caller via `mark_event_processed`	explicit	Correct

Issues

No new correctness issues found. The previous review's three actionable items have all been addressed with appropriate fixes and test coverage.

One minor observation remains:

apply_delete_validator_state uses expect on committed state

anchor/database/src/cluster_operations.rs:198-202

let metadata = state
    .multi_state
    .validator_metadata
    .remove(validator_pubkey)
    .expect("Data should have existed");

This runs inside the apply_state closure of commit_db_update, after tx.commit() has already succeeded. If the in-memory state diverges from the DB (a bug elsewhere), this expect will panic rather than returning an error. In practice this is fine — divergence between committed DB state and in-memory state would indicate a serious bug that warrants a crash. This is not new to this PR and is consistent with the existing pattern, so not actionable here.

Observations

The notify parameter usage is correct throughout: state-changing events (commit_operator_added, commit_validator_added, etc.) use true, while bookkeeping operations (commit_owner_nonce, commit_seen_operator_id, mark_event_processed, advance_processed_block) use false.
The SkippableCommitted error variant correctly prevents double cursor advancement in finish_processed_log — when a handler commits partial progress (nonce bump or max-seen-operator-id) before returning a skip error, the caller does not call mark_event_processed again.
skip_processed_logs uses lexicographic (block_number, transaction_index, log_index) comparison with <=, which correctly skips the cursor event itself on replay.
The historical_sync outer loop now re-reads committed progress (sync.rs:508-511) instead of start_block = end_block + 1, keeping it aligned with the DB-owned progress model.
Schema v4 migration (schema.rs:32-36) adds three nullable INTEGER columns with CHECK (>= 0) constraints. UPDATE_BLOCK_NUMBER (sql_operations.rs:111-117) correctly NULLs cursor columns when collapsing to a block boundary.
Test coverage is solid: test_processed_event_cursor_after_restart, test_resume_skips_already_processed_logs_in_same_block, and the new test_older_end_block_does_not_regress_progress together prove the three key properties of the cursor system.

Summary

The previous review's actionable items have been addressed. The commit boundary invariant holds across all production paths. The defensive guards (debug_assert! for cursor ordering, explicit regression check in advance_processed_block_if_needed) and documentation (slashing idempotence contract, next_block_to_fetch invariant) strengthen the model. No new issues found.

diegomrsantos · 2026-03-17T18:42:48Z

anchor/eth/src/event_processor.rs

+        let operators_valid = self
+            .db
+            .with_state(|state| validate_operators(&operator_ids, &cluster_id, state));
+        if let Err(err) = operators_valid {


@dknopik @shane-moore I reverted validate_operators back to ExecutionError::Database, but this call site is still where the ambiguity matters. If required operators are missing from committed state, we know the validator event is not safely applicable, but we do not know whether that comes from malformed history or local state inconsistency. This PR is not meant to redefine that boundary, so I kept the old classification and wanted to call out the ambiguity explicitly.

diegomrsantos · 2026-03-17T18:42:58Z

anchor/eth/src/event_processor.rs

                    "Failed to fetch validator metadata from database"
                );
-                return Err(ExecutionError::Database(
+                return Err(EventActionError::Fatal(ExecutionError::Database(


@dknopik @shane-moore I reverted these missing-metadata and missing-cluster branches back to DB-style failures. From this code we cannot prove the ValidatorRemoved event is invalid; all we know is that committed local state is missing the validator state the event expects. That could still be malformed history, but it could also be local inconsistency, so treating it as skippable InvalidEvent felt out of scope for this PR.

diegomrsantos · 2026-03-18T10:19:45Z

Closing in favor of the smaller replacement stack discussed on #880:
#880 (comment)

The problem is valid, but this PR is too large for the boundary fix and it introduces event-level resume machinery that we do not want to take forward. The replacement path is a smaller series from unstable: database prep first, then the block-scoped transaction fix, then cleanup.

dknopik mentioned this pull request Mar 13, 2026

fix: offload index sync store to blocking task #841

Open

diegomrsantos self-assigned this Mar 16, 2026

diegomrsantos commented Mar 16, 2026

View reviewed changes

diegomrsantos marked this pull request as ready for review March 16, 2026 21:51

This comment was marked as outdated.

Sign in to view

diegomrsantos added 3 commits March 17, 2026 14:54

refactor(database): add post-commit event progress boundary

d4a6e60

refactor(eth): process logs with per-event commits

ba5fffa

fix(eth): harden per-event sync semantics

ffad62c

diegomrsantos force-pushed the fix/db-publication-boundary-simple branch from 79ea6cd to 53c6cda Compare March 17, 2026 14:04

refactor(database): align tests with commit boundary

e506639

diegomrsantos force-pushed the fix/db-publication-boundary-simple branch from 53c6cda to e506639 Compare March 17, 2026 14:52

diegomrsantos added 3 commits March 17, 2026 17:41

refactor(eth): inline validator exit cursor handling

bd9357e

fix(eth): keep missing operators as db inconsistency

b78bea6

fix(eth): keep missing validator state as db inconsistency

20784f6

diegomrsantos commented Mar 17, 2026

View reviewed changes

diegomrsantos added 9 commits March 17, 2026 20:01

docs(eth): clarify operator-added skip semantics

ab3fc9b

refactor(eth): avoid sqlite error string matching

0cf0017

refactor(database): remove unnecessary commit-path clones

887ac3d

style(database): fix borrowed index update lookup

9d005a4

refactor(eth): avoid cloning state lookups

6f5ae6e

test(eth): modularize event processor coverage

c9216bd

docs(eth): explain modular event processor tests

bcf2471

test(eth): cover remaining event handlers

297100c

test(database): cover remaining boundary helpers

021117d

diegomrsantos marked this pull request as draft March 18, 2026 07:58

diegomrsantos mentioned this pull request Mar 18, 2026

Rework event processing around a single store boundary #880

Open

diegomrsantos closed this Mar 18, 2026

diegomrsantos mentioned this pull request Mar 18, 2026

refactor(database): prepare tx-scoped block processing #898

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: restore a post-commit boundary in event processing#885

fix: restore a post-commit boundary in event processing#885
diegomrsantos wants to merge 16 commits intosigp:unstablefrom
diegomrsantos:fix/db-publication-boundary-simple

diegomrsantos commented Mar 12, 2026

Uh oh!

diegomrsantos commented Mar 12, 2026 •

edited

Loading

Uh oh!

diegomrsantos Mar 16, 2026

Uh oh!

diegomrsantos commented Mar 16, 2026

Uh oh!

This comment was marked as outdated.

diegomrsantos commented Mar 16, 2026

Uh oh!

claude-code-actions-sigp bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

diegomrsantos Mar 17, 2026 •

edited

Loading

Uh oh!

diegomrsantos Mar 17, 2026 •

edited

Loading

Uh oh!

diegomrsantos commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

diegomrsantos commented Mar 12, 2026

Problem, Evidence, and Context

Change Overview

Risks, Trade-offs, and Mitigations

Validation

Rollback

Blockers / Dependencies

Additional Info / Next Steps

Uh oh!

diegomrsantos commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Database-owned commit boundary

2. Per-event processing model

3. Sync semantics and hardening

4. Tests and cleanup

Shortest useful path

What to ignore on first pass

Uh oh!

diegomrsantos Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

diegomrsantos commented Mar 16, 2026

Uh oh!

This comment was marked as outdated.

diegomrsantos commented Mar 16, 2026

Uh oh!

claude-code-actions-sigp bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: fix: restore a post-commit boundary in event processing (re-review)

Changes since last review

Invariant verification

Issues

Observations

Summary

Uh oh!

diegomrsantos Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

diegomrsantos Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

diegomrsantos commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

diegomrsantos commented Mar 12, 2026 •

edited

Loading

claude-code-actions-sigp bot commented Mar 16, 2026 •

edited

Loading

diegomrsantos Mar 17, 2026 •

edited

Loading

diegomrsantos Mar 17, 2026 •

edited

Loading