Skip to content

fix(eth): process contract logs one block at a time#904

Open
diegomrsantos wants to merge 20 commits intosigp:unstablefrom
diegomrsantos:refactor/block-tx-pr2-boundary
Open

fix(eth): process contract logs one block at a time#904
diegomrsantos wants to merge 20 commits intosigp:unstablefrom
diegomrsantos:refactor/block-tx-pr2-boundary

Conversation

@diegomrsantos
Copy link
Member

Problem, Evidence, and Context (Required)

  • This fixes the behavior bug that remains after PR 1: later contract events in the same block need to see earlier writes through the active SQLite transaction before NetworkState is published.
  • This is worth doing now because PR #885 is being replaced by a smaller reviewed stack, and PR #898 already prepared the database layer for this boundary change.
  • Evidence: event_processor previously mixed tx writes with in-memory reads, which breaks same-block flows once publication is deferred. This PR adds focused same-block integration coverage for OperatorAdded + ValidatorAdded and ValidatorAdded + ValidatorRemoved.
  • Relevant links: #880, #898, replacement for the behavior-fix portion of #885.

Change Overview (Required)

  • Process fetched contract logs one block at a time inside event_processor, with one SQLite transaction per block.
  • Route same-block reads through tx-scoped database helpers instead of self.db.state().
  • Publish NetworkState and trigger index-sync / exit side effects only after the block transaction commits.
  • Add position-aware test helpers so integration tests can model real same-block ordering.
  • Intentionally did not change schema, sync resume machinery, or NetworkState consumers outside this boundary.

Risks, Trade-offs, and Mitigations (Required)

  • Main risk is changing the event-processing boundary in historical and live sync.
  • The diff stays scoped to event_processor, the small set of tx-scoped reads it needs, and targeted integration coverage.
  • Trade-off: fetched multi-block batches are now committed block-by-block instead of as one larger transaction. That is intentional so durable progress, state publication, and post-commit side effects align with block processing.

Validation (Required)

  • cargo check -p eth
  • cargo fmt --all --check
  • cargo clippy -p database -p eth --all-targets -- -D warnings
  • cargo test -p eth --tests

Rollback (Required for behavior or runtime changes; optional otherwise)

  • Safe revert after PR #898 if needed.
  • No schema or migration impact.
  • Reverts to the pre-PR2 batch-wide event-processing boundary.

Blockers / Dependencies (Optional)

  • Depends on merged PR #898.

Additional Info / Next Steps (Optional)

  • A follow-up cleanup PR can remove any dead helper or cursor-adjacent code left over from the replaced direction.

.operator_exists_tx(*operator_id, tx)
.map_err(|e| ExecutionError::Database(e.to_string()))?;
if !exists {
return Err(ExecutionError::Database(format!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns ExecutionError::Database but the query succeeded — the operator just isn't registered. Same pattern at event_processor.rs:573 (missing validator metadata) and event_processor.rs:591 (missing cluster). All three should be InvalidEvent.

Pre-existing and harmless today since all errors skip-and-continue, but #351 makes DatabaseAbortBatch — these would become sync-stalling errors on legitimate "data not yet present" conditions. Worth reclassifying in either this PR or #351 before #351 merges.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the ambiguity is real, but I don’t think #904 should change that classification. In these paths we cannot actually prove whether the event is invalid or local committed state is unexpectedly missing data, so reclassifying to InvalidEvent here would be a behavior change, not just cleanup. That risk existed before this PR; #904 is only changing the block-scoped processing boundary, so I’d rather keep the existing semantics here and handle the classification question separately.

})?;

// Get the fee recipient if one has been stored, otherwise default to the owner address
let fee_recipient = match self.db.fee_recipient_for_owner(&owner, tx) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (pre-existing): The _ arm conflates Ok(None) (no custom fee recipient — legitimate) with Err(...) (DB failure). Splitting into explicit arms with a warn! on the error path would make DB failures observable without blocking validator registration. Low probability but free insurance on a financial-correctness path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this would be a cleaner shape, but I’m leaving it out of #904.

This _ arm is pre-existing from unstable, and changing it here would be a small behavior/logging change on the validator-add path rather than part of the block-boundary work in this PR. If we touch it separately, I agree the better version is to split Ok(None) from Err(...) and log the DB-failure path explicitly.

}

#[tokio::test]
async fn test_database_transaction_rollback_on_error() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test does NOT test rollback — it tests that a valid event commits while a malformed event is skipped. With per-block transactions, the old "entire batch rolls back on error" behavior no longer exists. The test name is actively misleading.

Recommendation: Rename to test_malformed_events_skipped_without_affecting_valid_events.

i know this was untouched as part of this pr, but easy fix either here or new gh issue, whichever you think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f1d4f592f.

I renamed it to test_malformed_events_skipped_without_affecting_valid_events and added a short doc comment making the preserved semantics explicit. This was a pre-existing test name on unstable; #904 just made the old name more misleading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants