Skip to content

feat: anchor-evm and anchor-service metrics#9

Open
m0ar wants to merge 3 commits intomainfrom
feat/anchor-metrics
Open

feat: anchor-evm and anchor-service metrics#9
m0ar wants to merge 3 commits intomainfrom
feat/anchor-metrics

Conversation

@m0ar
Copy link
Copy Markdown

@m0ar m0ar commented Feb 25, 2026

Summary by CodeRabbit

  • New Features
    • Added end-to-end metrics and observability across EVM anchoring and anchor service: transaction lifecycle, retries, success/failure counts, timing (total and confirmation), gas cost, wallet balance, error categorization, batch pipeline timings, deduplication, high-water mark, and service lifecycle events.
    • Exposed Prometheus-compatible metrics and a recorder interface for integration and export.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

Adds Prometheus-based metrics and recording to anchor-evm and anchor-service: new metrics modules, workspace/prometheus deps, optional Metrics injection points, instrumentation across EVM transaction flow and AnchorService batch pipeline, and timing/gas fields on AnchorResult.

Changes

Cohort / File(s) Summary
Workspace deps
anchor-evm/Cargo.toml, anchor-service/Cargo.toml
Added ceramic-metrics and prometheus-client workspace/dependency entries.
EVM metrics & instrumentation
anchor-evm/src/metrics.rs, anchor-evm/src/evm_transaction_manager.rs
New Metrics, EvmEvent, label types and Recorder; EvmTransactionManager gains metrics: Option<Arc<Metrics>>, new_with_metrics, timing/gas fields added to AnchorResult, event recording (submission, retry, success, failure), and error categorization.
Anchor service metrics & instrumentation
anchor-service/src/metrics.rs, anchor-service/src/anchor_batch.rs
New Metrics, AnchorEvent, label types and Recorder; AnchorService gets metrics: Option<Arc<Metrics>>, new_with_metrics, emits lifecycle and batch events (start, empty, success, failure), and records per-stage durations/counts and high-water updates.
Crate re-exports
anchor-evm/src/lib.rs, anchor-service/src/lib.rs
Expose metrics modules and pub use metrics::Metrics; re-exports in both crates.
Tests & registration
anchor-evm/src/metrics.rs, anchor-service/src/metrics.rs
Unit tests added for metrics registration and Recorder event handling in both modules.

Sequence Diagram(s)

sequenceDiagram
    participant Service as AnchorService
    participant TM as EvmTransactionManager
    participant Node as EthereumNode
    participant Metrics as MetricsRegistry
    participant Store as EventStore

    rect rgba(200,200,255,0.5)
    Service->>Metrics: record(ServiceStarted)
    end

    rect rgba(200,255,200,0.5)
    Service->>Service: fetch & deduplicate requests
    Service->>Metrics: record(BatchStarted)
    end

    rect rgba(255,200,200,0.5)
    Service->>TM: submit_and_wait(batch_tx)
    TM->>Metrics: record(TransactionSubmitted)
    TM->>Node: send transaction
    Node-->>TM: tx_hash / error
    TM->>Node: poll confirmations
    TM->>Metrics: record(TransactionSucceeded / TransactionFailed / RetryAttempt)
    TM-->>Service: AnchorResult { total_duration, confirmation_duration, gas_cost_wei, ... }
    end

    rect rgba(200,255,200,0.5)
    Service->>Store: store time events
    Store-->>Service: ack
    Service->>Metrics: record(BatchSucceeded / HighWaterMarkUpdated)
    end

    rect rgba(200,200,255,0.5)
    Service->>Metrics: record(ServiceStopped)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰
I hopped through crates and Cargo trees,
Counting wei and gwei on gentle breeze,
Timed each hop and labeled each fall,
Metrics now cheer from root to call,
A rabbit’s tally for one and all. 🥕📊

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and concisely summarizes the main change: adding metrics instrumentation to both anchor-evm and anchor-service packages.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/anchor-metrics

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
anchor-evm/src/metrics.rs (1)

255-311: Strengthen metrics tests with value assertions.

Lines 255-311 currently confirm only that record paths execute. Please assert observable counter/gauge/histogram outcomes to protect event-to-metric wiring.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@anchor-evm/src/metrics.rs` around lines 255 - 311, Tests currently only call
Metrics::register and Metrics.record for various EvmEvent variants but do not
assert any metric values; update each test (e.g.,
test_record_transaction_submitted, test_record_transaction_succeeded,
test_record_transaction_failed, test_record_retry_attempt,
test_record_wallet_balance) to query the Registry after invoking Metrics.record
and assert the expected metric changes — for example check the appropriate
counter increment for TransactionSubmitted/RetryAttempt, error-type labeled
counter for TransactionFailed, histogram buckets or observed value for
TransactionSucceeded durations and gas_cost_wei, and gauge value for
WalletBalanceUpdated — by using the same Registry instance returned to
Metrics::register to gather metrics and comparing the concrete values/labels to
the expected outcomes.
anchor-service/src/metrics.rs (1)

319-375: Add assertions for metric values in tests.

Lines 319-375 only verify that calls don’t panic. They won’t catch regressions like wrong counters/histograms being updated. Please assert emitted values (e.g., via encoded registry output) for each event path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@anchor-service/src/metrics.rs` around lines 319 - 375, Tests currently only
ensure methods don't panic; update each test to assert the registry actually
recorded the expected metrics after calling Metrics::register(...) and
metrics.record(...). For each test, call registry.gather() (or encode with
prometheus::TextEncoder) and assert the presence and expected values for the
relevant metric families: ServiceStarted/ServiceStopped should increment their
counters; BatchSucceeded should increment counters for
requests_received/requests_after_dedup and observe histogram values/duration
metrics (total_duration, fetch_duration, tree_build_duration, anchor_duration,
store_duration); BatchFailed should increment the failure counter with the
error_type label; BatchEmpty should increment its counter; HighWaterMarkUpdated
should set/update the gauge to the provided value. Use the Metrics::register
return and AnchorEvent variants (ServiceStarted, ServiceStopped, BatchSucceeded,
BatchFailed, BatchEmpty, HighWaterMarkUpdated) to locate and validate the exact
metric names/labels produced by your Metrics implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@anchor-evm/src/metrics.rs`:
- Around line 239-245: EvmEvent::WalletBalanceUpdated currently converts a u128
balance_wei to i64 with a plain as cast which can wrap for values > i64::MAX;
change the conversion to attempt a safe conversion (use i64::try_from or
equivalent) and clamp overflow to i64::MAX (or 0 for underflow) before calling
wallet_balance_gwei.get_or_create(&ChainLabels { chain_id: *chain_id
}).set(...); ensure you first divide balance_wei by 1_000_000_000 to get gwei,
then perform the try_from/clamp and pass the resulting i64 to set to avoid
negative wraparound.

In `@anchor-service/src/anchor_batch.rs`:
- Around line 179-190: The code currently calls
self.high_water_mark_store.high_water_mark().await.expect(...), which will panic
on transient DB failures; change this to handle the Result instead: call
self.high_water_mark_store.high_water_mark().await and match on
Ok(high_water_mark) to continue and record AnchorEvent::HighWaterMarkUpdated as
before, but on Err(e) record the failure metric (e.g. emit a BatchFailed or
similar failure label via self.metrics) and return Err(e) (or map it into the
function's error type) instead of panicking so the caller can handle the error
and metrics are emitted; update references to
high_water_mark_store.high_water_mark(), AnchorEvent::HighWaterMarkUpdated, and
the BatchFailed metric emission accordingly.

---

Nitpick comments:
In `@anchor-evm/src/metrics.rs`:
- Around line 255-311: Tests currently only call Metrics::register and
Metrics.record for various EvmEvent variants but do not assert any metric
values; update each test (e.g., test_record_transaction_submitted,
test_record_transaction_succeeded, test_record_transaction_failed,
test_record_retry_attempt, test_record_wallet_balance) to query the Registry
after invoking Metrics.record and assert the expected metric changes — for
example check the appropriate counter increment for
TransactionSubmitted/RetryAttempt, error-type labeled counter for
TransactionFailed, histogram buckets or observed value for TransactionSucceeded
durations and gas_cost_wei, and gauge value for WalletBalanceUpdated — by using
the same Registry instance returned to Metrics::register to gather metrics and
comparing the concrete values/labels to the expected outcomes.

In `@anchor-service/src/metrics.rs`:
- Around line 319-375: Tests currently only ensure methods don't panic; update
each test to assert the registry actually recorded the expected metrics after
calling Metrics::register(...) and metrics.record(...). For each test, call
registry.gather() (or encode with prometheus::TextEncoder) and assert the
presence and expected values for the relevant metric families:
ServiceStarted/ServiceStopped should increment their counters; BatchSucceeded
should increment counters for requests_received/requests_after_dedup and observe
histogram values/duration metrics (total_duration, fetch_duration,
tree_build_duration, anchor_duration, store_duration); BatchFailed should
increment the failure counter with the error_type label; BatchEmpty should
increment its counter; HighWaterMarkUpdated should set/update the gauge to the
provided value. Use the Metrics::register return and AnchorEvent variants
(ServiceStarted, ServiceStopped, BatchSucceeded, BatchFailed, BatchEmpty,
HighWaterMarkUpdated) to locate and validate the exact metric names/labels
produced by your Metrics implementation.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de597d6 and 34229b2.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • anchor-evm/Cargo.toml
  • anchor-evm/src/evm_transaction_manager.rs
  • anchor-evm/src/lib.rs
  • anchor-evm/src/metrics.rs
  • anchor-service/Cargo.toml
  • anchor-service/src/anchor_batch.rs
  • anchor-service/src/lib.rs
  • anchor-service/src/metrics.rs

Comment on lines +179 to +190
let high_water_mark = self
.high_water_mark_store
.high_water_mark()
.await
.expect("error getting high water mark from database");

// Update high water mark metric
if let Some(ref metrics) = self.metrics {
metrics.record(&AnchorEvent::HighWaterMarkUpdated {
value: high_water_mark,
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle high-water-mark read errors without panicking.

Line 183 uses .expect(...); a transient DB failure will crash batch processing and skip BatchFailed metrics emission. This path should return Err and record a failure label instead.

Proposed fix
-        let high_water_mark = self
-            .high_water_mark_store
-            .high_water_mark()
-            .await
-            .expect("error getting high water mark from database");
+        let high_water_mark = match self.high_water_mark_store.high_water_mark().await {
+            Ok(value) => value,
+            Err(e) => {
+                if let Some(ref metrics) = self.metrics {
+                    metrics.record(&AnchorEvent::BatchFailed {
+                        error_type: "high_water_mark_error".to_string(),
+                    });
+                }
+                return Err(e);
+            }
+        };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@anchor-service/src/anchor_batch.rs` around lines 179 - 190, The code
currently calls self.high_water_mark_store.high_water_mark().await.expect(...),
which will panic on transient DB failures; change this to handle the Result
instead: call self.high_water_mark_store.high_water_mark().await and match on
Ok(high_water_mark) to continue and record AnchorEvent::HighWaterMarkUpdated as
before, but on Err(e) record the failure metric (e.g. emit a BatchFailed or
similar failure label via self.metrics) and return Err(e) (or map it into the
function's error type) instead of panicking so the caller can handle the error
and metrics are emitted; update references to
high_water_mark_store.high_water_mark(), AnchorEvent::HighWaterMarkUpdated, and
the BatchFailed metric emission accordingly.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
anchor-evm/src/evm_transaction_manager.rs (1)

219-237: ⚠️ Potential issue | 🟠 Major

Compute gas-cost baseline after pending transactions are cleared.

starting_balance is sampled before wait_for_pending_transactions(...). If old pending txs get mined during that wait, current gas-cost metrics include unrelated spend.

Proposed fix
-        // Log starting wallet balance
-        let starting_balance = provider
-            .get_balance(wallet_address)
-            .await
-            .map_err(|e| anyhow!("Failed to get wallet balance: {}", e))?;
-        info!("Starting wallet balance: {} wei", starting_balance);
-
-        // Update wallet balance metric
-        if let Some(ref metrics) = self.metrics {
-            // U256 to u128: saturate at u128::MAX (a very large balance)
-            let balance_u128: u128 = starting_balance.try_into().unwrap_or(u128::MAX);
-            metrics.record(&EvmEvent::WalletBalanceUpdated {
-                chain_id: self.config.chain_id,
-                balance_wei: balance_u128,
-            });
-        }
-
         // Wait for any pending transactions from previous runs to clear
         Self::wait_for_pending_transactions(&provider, wallet_address).await;
+
+        // Log starting wallet balance (after pending txs are cleared)
+        let starting_balance = provider
+            .get_balance(wallet_address)
+            .await
+            .map_err(|e| anyhow!("Failed to get wallet balance: {}", e))?;
+        info!("Starting wallet balance: {} wei", starting_balance);
+
+        if let Some(ref metrics) = self.metrics {
+            let balance_u128: u128 = starting_balance.try_into().unwrap_or(u128::MAX);
+            metrics.record(&EvmEvent::WalletBalanceUpdated {
+                chain_id: self.config.chain_id,
+                balance_wei: balance_u128,
+            });
+        }

Also applies to: 316-333

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@anchor-evm/src/evm_transaction_manager.rs` around lines 219 - 237, The
starting_balance is sampled before pending transactions are awaited, so if prior
txs mine during Self::wait_for_pending_transactions(...) the recorded
WalletBalanceUpdated (EvmEvent::WalletBalanceUpdated) will include unrelated gas
spend; move the balance sampling and metrics.record block (the code that sets
starting_balance, converts to balance_u128, and calls metrics.record) to after
Self::wait_for_pending_transactions(&provider, wallet_address). Apply the same
change to the analogous block around lines 316-333 to ensure balances are always
taken after pending transactions clear.
♻️ Duplicate comments (2)
anchor-evm/src/metrics.rs (1)

255-257: ⚠️ Potential issue | 🟠 Major

Prevent signed wraparound when recording wallet balance gauge.

On Line 256, as i64 can wrap for large balances and emit negative gauge values. Use checked conversion and clamp before .set(...).

Proposed fix
-                // Convert wei to gwei (divide by 10^9) to fit in i64
-                let balance_gwei = (*balance_wei / 1_000_000_000) as i64;
+                // Convert wei to gwei (divide by 10^9) and clamp to i64 range
+                let balance_gwei_u128 = *balance_wei / 1_000_000_000;
+                let balance_gwei = i64::try_from(balance_gwei_u128).unwrap_or(i64::MAX);
                 self.wallet_balance_gwei
                     .get_or_create(&labels)
                     .set(balance_gwei);
#!/bin/bash
# Verify current conversion site and inspect surrounding code.
rg -n -C3 "WalletBalanceUpdated|balance_gwei|as i64" anchor-evm/src/metrics.rs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@anchor-evm/src/metrics.rs` around lines 255 - 257, The current cast
"(*balance_wei / 1_000_000_000) as i64" can wrap and produce negative gauge
values; change the conversion to use a checked/try conversion from the unsigned
gwei value to i64 and clamp to the valid range before calling
self.wallet_balance_gwei.set(...). Locate the conversion site around
balance_gwei and the metric setter (references: balance_wei, balance_gwei,
self.wallet_balance_gwei, WalletBalanceUpdated) and replace the direct as i64
cast with a safe conversion (e.g., compute gwei as an unsigned integer, attempt
try_into::<i64>() or use i64::try_from, and on Err or overflow set to i64::MAX
or 0 as appropriate) then pass the clamped i64 to .set().
anchor-service/src/anchor_batch.rs (1)

179-183: Handle high-water-mark read errors without panicking.

.expect() will crash batch processing on a transient DB error, bypassing BatchFailed metric emission entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@anchor-service/src/anchor_batch.rs` around lines 179 - 183, Replace the
panic-causing expect on the high water mark read with proper error handling:
call self.high_water_mark_store.high_water_mark().await and match the Result; on
Err(e) log the error, increment/emit the BatchFailed metric and return/propagate
an Err from the enclosing function instead of panicking, and on Ok(val) continue
using val as high_water_mark. Update any function signature if necessary to
return a Result so the error can be propagated; reference the
high_water_mark_store.high_water_mark() call and the variable high_water_mark so
reviewers can locate the change.
🧹 Nitpick comments (4)
anchor-evm/src/metrics.rs (1)

269-325: Strengthen metrics tests with value assertions.

These tests currently only call record(...); they don't assert counters/gauges/histograms changed as expected. A recorder regression could pass unnoticed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@anchor-evm/src/metrics.rs` around lines 269 - 325, Update each test that
calls Metrics::register and metrics.record(...) to assert the metric changes:
after recording an EvmEvent (TransactionSubmitted, TransactionSucceeded,
TransactionFailed, RetryAttempt, WalletBalanceUpdated), call registry.gather()
and locate the specific metric by name (e.g., transaction_submitted_total,
transaction_succeeded_duration_seconds histogram, transaction_failed_total with
label error_type, retry_attempts_total, wallet_balance_wei gauge) and assert the
sample value(s) reflect the expected increment or observed value; use the
Metrics::register return to know metric label keys (like chain_id) and assert
label matches chain_id=100 as well as expected histogram buckets/counts and
gauge value for balance_wei and scalar counters for
submissions/failures/retries. Ensure each test replaces the no-op record call
with these concrete assertions so regressions will fail.
anchor-service/src/metrics.rs (3)

83-86: requests_deduplicated counter name is semantically inverted.

The counter stores requests_after_dedup — the count of requests surviving deduplication — but the name requests_deduplicated implies the count of requests removed by deduplication. These are inverses. The doc comment is correct; the identifier is not.

♻️ Proposed rename
-    /// Requests remaining after deduplication.
-    requests_deduplicated: Counter,
+    /// Requests remaining after deduplication.
+    requests_after_dedup: Counter,

And in register! / record / Self {} construction:

-        register!(
-            requests_deduplicated,
-            "Anchor requests remaining after deduplication",
-            Counter::default(),
-            sub_registry
-        );
+        register!(
+            requests_after_dedup,
+            "Anchor requests remaining after deduplication",
+            Counter::default(),
+            sub_registry
+        );
-                self.requests_deduplicated.inc_by(*requests_after_dedup);
+                self.requests_after_dedup.inc_by(*requests_after_dedup);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@anchor-service/src/metrics.rs` around lines 83 - 86, The field name
requests_deduplicated is semantically inverted: it actually counts
requests_after_dedup (requests surviving deduplication). Rename the identifier
to requests_after_dedup (and update all usages) so it matches the doc comment;
update its declaration, the register! invocation, any record() calls, and the
Self { ... } construction to use requests_after_dedup consistently.

107-108: dedup_ratio semantics conflict with its name.

The histogram stores after_dedup / received (retention ratio: 1.0 = nothing removed). The name dedup_ratio typically implies how much was deduped (1 - retention), which is the opposite. A consumer building a dashboard alert on this metric would likely interpret it backwards.

Consider renaming to dedup_retention_ratio and updating the doc comment to say "Ratio of requests surviving deduplication (1.0 = no deduplication)" to make the direction explicit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@anchor-service/src/metrics.rs` around lines 107 - 108, The field and its docs
are misleading: rename the Histogram field dedup_ratio to dedup_retention_ratio
and update its doc comment to "Ratio of requests surviving deduplication (1.0 =
no deduplication)"; also update every usage/registration/metric label that
references dedup_ratio (e.g., where the Histogram is constructed, recorded to,
or exported) to use dedup_retention_ratio so callers and dashboards reflect the
retention semantics consistently.

313-374: Tests only verify no panic; they don't assert recorded values.

None of the tests inspect the resulting counter/gauge/histogram values after recording events. A regression in the record() impl (e.g., incrementing the wrong field) would pass undetected.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@anchor-service/src/metrics.rs` around lines 313 - 374, Tests currently only
ensure Metrics::register and Metrics::record don't panic but never assert the
resulting metric values; update the tests that call Metrics::record (e.g.,
test_record_service_lifecycle, test_record_batch_succeeded,
test_record_batch_failed, test_record_batch_empty, test_record_high_water_mark)
to gather metrics from the Registry after each record and assert the expected
counter/gauge/histogram values were mutated correctly for the corresponding
AnchorEvent variant. Specifically, after calling Metrics::record on a given
event, use the Registry::gather() output to find metric families by their names
(the names defined in Metrics::register), then assert counts, gauge values, and
histogram buckets/sums match the expected increments or values for
ServiceStarted/ServiceStopped, BatchSucceeded fields (requests_received,
requests_after_dedup, durations), BatchFailed error counter, BatchEmpty counter,
and HighWaterMarkUpdated gauge; this will catch regressions where the wrong
metric is updated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@anchor-evm/src/evm_transaction_manager.rs`:
- Around line 171-172: The transaction_duration metric is being started too
early by setting total_start = Instant::now() before provider setup and
preflight; change the timing so the Instant::now() is taken immediately before
the actual transaction submission (right after preflight checks and provider
readiness) and use that timestamp for measuring submission→confirmation; update
all other occurrences where transaction_duration is measured (the other
total_start usages) to the same pattern so the metric consistently reflects
submission-to-confirmation only, and rename or reuse the variable (e.g.,
transaction_start) near the submit_transaction / send_and_confirm code paths to
make intent clear.

In `@anchor-service/src/anchor_batch.rs`:
- Around line 234-292: process_next_batch duplicates the pipeline implemented in
anchor_batch; refactor process_next_batch to call anchor_batch (or extracted
sub-methods) instead of inlining build_merkle_tree, tx_manager.anchor_root, and
build_time_events, and ensure you map errors back to metrics by catching each
phase's Result and emitting metrics.record(&AnchorEvent::BatchFailed {
error_type: "<phase>_error".to_string() }) before returning the propagated
error; use the same returned values
(root_cid/proof/remote_merkle_nodes/chain_inclusion/time_events or the
refactored TimeEventBatch) to construct the TimeEventBatch/continue processing
so timing/metrics are preserved, referencing functions/ids: process_next_batch,
anchor_batch, build_merkle_tree, tx_manager.anchor_root, build_time_events,
TimeEventBatch, and AnchorEvent::BatchFailed.
- Around line 185-190: The pre-batch emission of
AnchorEvent::HighWaterMarkUpdated is incorrect because the high water mark is
only read, not modified; remove the metrics.record call that emits
HighWaterMarkUpdated from the pre-batch block (the one surrounding
metrics.record(&AnchorEvent::HighWaterMarkUpdated { value: high_water_mark }))
and rely on the existing post-store emission from store_time_events, or
alternatively replace it with a new read-specific event (e.g.,
AnchorEvent::HighWaterMarkRead) if you need to capture the starting offset;
update any metric name/enum usage accordingly so HighWaterMarkUpdated is only
emitted after store_time_events completes.

---

Outside diff comments:
In `@anchor-evm/src/evm_transaction_manager.rs`:
- Around line 219-237: The starting_balance is sampled before pending
transactions are awaited, so if prior txs mine during
Self::wait_for_pending_transactions(...) the recorded WalletBalanceUpdated
(EvmEvent::WalletBalanceUpdated) will include unrelated gas spend; move the
balance sampling and metrics.record block (the code that sets starting_balance,
converts to balance_u128, and calls metrics.record) to after
Self::wait_for_pending_transactions(&provider, wallet_address). Apply the same
change to the analogous block around lines 316-333 to ensure balances are always
taken after pending transactions clear.

---

Duplicate comments:
In `@anchor-evm/src/metrics.rs`:
- Around line 255-257: The current cast "(*balance_wei / 1_000_000_000) as i64"
can wrap and produce negative gauge values; change the conversion to use a
checked/try conversion from the unsigned gwei value to i64 and clamp to the
valid range before calling self.wallet_balance_gwei.set(...). Locate the
conversion site around balance_gwei and the metric setter (references:
balance_wei, balance_gwei, self.wallet_balance_gwei, WalletBalanceUpdated) and
replace the direct as i64 cast with a safe conversion (e.g., compute gwei as an
unsigned integer, attempt try_into::<i64>() or use i64::try_from, and on Err or
overflow set to i64::MAX or 0 as appropriate) then pass the clamped i64 to
.set().

In `@anchor-service/src/anchor_batch.rs`:
- Around line 179-183: Replace the panic-causing expect on the high water mark
read with proper error handling: call
self.high_water_mark_store.high_water_mark().await and match the Result; on
Err(e) log the error, increment/emit the BatchFailed metric and return/propagate
an Err from the enclosing function instead of panicking, and on Ok(val) continue
using val as high_water_mark. Update any function signature if necessary to
return a Result so the error can be propagated; reference the
high_water_mark_store.high_water_mark() call and the variable high_water_mark so
reviewers can locate the change.

---

Nitpick comments:
In `@anchor-evm/src/metrics.rs`:
- Around line 269-325: Update each test that calls Metrics::register and
metrics.record(...) to assert the metric changes: after recording an EvmEvent
(TransactionSubmitted, TransactionSucceeded, TransactionFailed, RetryAttempt,
WalletBalanceUpdated), call registry.gather() and locate the specific metric by
name (e.g., transaction_submitted_total, transaction_succeeded_duration_seconds
histogram, transaction_failed_total with label error_type, retry_attempts_total,
wallet_balance_wei gauge) and assert the sample value(s) reflect the expected
increment or observed value; use the Metrics::register return to know metric
label keys (like chain_id) and assert label matches chain_id=100 as well as
expected histogram buckets/counts and gauge value for balance_wei and scalar
counters for submissions/failures/retries. Ensure each test replaces the no-op
record call with these concrete assertions so regressions will fail.

In `@anchor-service/src/metrics.rs`:
- Around line 83-86: The field name requests_deduplicated is semantically
inverted: it actually counts requests_after_dedup (requests surviving
deduplication). Rename the identifier to requests_after_dedup (and update all
usages) so it matches the doc comment; update its declaration, the register!
invocation, any record() calls, and the Self { ... } construction to use
requests_after_dedup consistently.
- Around line 107-108: The field and its docs are misleading: rename the
Histogram field dedup_ratio to dedup_retention_ratio and update its doc comment
to "Ratio of requests surviving deduplication (1.0 = no deduplication)"; also
update every usage/registration/metric label that references dedup_ratio (e.g.,
where the Histogram is constructed, recorded to, or exported) to use
dedup_retention_ratio so callers and dashboards reflect the retention semantics
consistently.
- Around line 313-374: Tests currently only ensure Metrics::register and
Metrics::record don't panic but never assert the resulting metric values; update
the tests that call Metrics::record (e.g., test_record_service_lifecycle,
test_record_batch_succeeded, test_record_batch_failed, test_record_batch_empty,
test_record_high_water_mark) to gather metrics from the Registry after each
record and assert the expected counter/gauge/histogram values were mutated
correctly for the corresponding AnchorEvent variant. Specifically, after calling
Metrics::record on a given event, use the Registry::gather() output to find
metric families by their names (the names defined in Metrics::register), then
assert counts, gauge values, and histogram buckets/sums match the expected
increments or values for ServiceStarted/ServiceStopped, BatchSucceeded fields
(requests_received, requests_after_dedup, durations), BatchFailed error counter,
BatchEmpty counter, and HighWaterMarkUpdated gauge; this will catch regressions
where the wrong metric is updated.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34229b2 and 0ddf5c0.

📒 Files selected for processing (4)
  • anchor-evm/src/evm_transaction_manager.rs
  • anchor-evm/src/metrics.rs
  • anchor-service/src/anchor_batch.rs
  • anchor-service/src/metrics.rs

Comment on lines +234 to +292
// Build Merkle tree and anchor
let tree_build_start = Instant::now();
let MerkleTree {
root_cid,
nodes: local_merkle_nodes,
count,
} = match build_merkle_tree(&anchor_requests) {
Ok(tree) => tree,
Err(e) => {
if let Some(ref metrics) = self.metrics {
metrics.record(&AnchorEvent::BatchFailed {
error_type: "tree_build_error".to_string(),
});
}
Ok(())
return Err(e);
}
Err(e) => Err(e),
};
let tree_build_duration = tree_build_start.elapsed();

// Anchor the root via TransactionManager
let anchor_start = Instant::now();
let RootTimeEvent {
proof,
detached_time_event,
mut remote_merkle_nodes,
chain_inclusion,
} = match self.tx_manager.anchor_root(root_cid).await {
Ok(result) => result,
Err(e) => {
if let Some(ref metrics) = self.metrics {
metrics.record(&AnchorEvent::BatchFailed {
error_type: "anchor_error".to_string(),
});
}
return Err(e);
}
};
let anchor_duration = anchor_start.elapsed();

// Build time events
let time_events = match build_time_events(&anchor_requests, &detached_time_event, count) {
Ok(events) => events,
Err(e) => {
if let Some(ref metrics) = self.metrics {
metrics.record(&AnchorEvent::BatchFailed {
error_type: "time_event_build_error".to_string(),
});
}
return Err(e);
}
};

remote_merkle_nodes.extend(local_merkle_nodes);
let time_event_batch = TimeEventBatch {
merkle_nodes: remote_merkle_nodes,
proof,
raw_time_events: time_events,
chain_inclusion,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

process_next_batch duplicates the entire anchor_batch pipeline.

The inlined logic at Lines 235–292 (build tree → anchor root → build time events → assemble TimeEventBatch) is identical to anchor_batch at Lines 332–351, now maintained in two places. A future change to one won't automatically apply to the other.

Refactor process_next_batch to delegate to anchor_batch, wrapping each step's ? with the appropriate BatchFailed emission:

♻️ Proposed refactor sketch
-        // Build Merkle tree and anchor
-        let tree_build_start = Instant::now();
-        let MerkleTree { ... } = match build_merkle_tree(&anchor_requests) { ... };
-        let tree_build_duration = tree_build_start.elapsed();
-        let anchor_start = Instant::now();
-        let RootTimeEvent { ... } = match self.tx_manager.anchor_root(root_cid).await { ... };
-        let anchor_duration = anchor_start.elapsed();
-        let time_events = match build_time_events(...) { ... };
-        remote_merkle_nodes.extend(local_merkle_nodes);
-        let time_event_batch = TimeEventBatch { ... };
+        let time_event_batch = match self.anchor_batch(&anchor_requests).await {
+            Ok(batch) => batch,
+            Err(e) => {
+                if let Some(ref metrics) = self.metrics {
+                    metrics.record(&AnchorEvent::BatchFailed {
+                        error_type: "anchor_error".to_string(),
+                    });
+                }
+                return Err(e);
+            }
+        };

Note: Per-phase timing (tree_build, anchor, store) currently tracked inline would need to be exposed by refactoring anchor_batch or split into sub-methods if granular durations are still required.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@anchor-service/src/anchor_batch.rs` around lines 234 - 292,
process_next_batch duplicates the pipeline implemented in anchor_batch; refactor
process_next_batch to call anchor_batch (or extracted sub-methods) instead of
inlining build_merkle_tree, tx_manager.anchor_root, and build_time_events, and
ensure you map errors back to metrics by catching each phase's Result and
emitting metrics.record(&AnchorEvent::BatchFailed { error_type:
"<phase>_error".to_string() }) before returning the propagated error; use the
same returned values
(root_cid/proof/remote_merkle_nodes/chain_inclusion/time_events or the
refactored TimeEventBatch) to construct the TimeEventBatch/continue processing
so timing/metrics are preserved, referencing functions/ids: process_next_batch,
anchor_batch, build_merkle_tree, tx_manager.anchor_root, build_time_events,
TimeEventBatch, and AnchorEvent::BatchFailed.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
anchor-evm/src/evm_transaction_manager.rs (1)

242-243: ⚠️ Potential issue | 🟡 Minor

transaction_duration still includes retry/backoff time.

At Line 242, the timer starts before the retry loop, so values sent at Line 605-609 include delay and prior failed attempts. That still conflicts with the “submission → confirmation” semantics.

Please either (a) start timing immediately before each submission call, or (b) rename/document this as end-to-end operation duration.

Also applies to: 311-313, 398-399, 605-609

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@anchor-evm/src/evm_transaction_manager.rs` around lines 242 - 243, The timer
total_start/transaction_duration currently begins before the retry loop, so
logged transaction_duration includes retry/backoff time; either move the
Instant::now() to just before each actual submission call (i.e., start timing
immediately before the call that sends/submits the transaction) so
transaction_duration measures submission→confirmation, or else rename/document
the variable to reflect end-to-end operation duration; update all occurrences
that use total_start/transaction_duration (the variables named total_start and
transaction_duration in evm_transaction_manager.rs and the places that log them
around the submission/confirmation flow) to follow the chosen approach for
consistency.
🧹 Nitpick comments (1)
anchor-evm/src/metrics.rs (1)

270-327: Add assertions to metric recording tests (currently smoke-only).

At Line 270-327, tests call record(...) but never validate counters/gauges/histograms changed as expected. Adding assertions here will catch silent regressions in metric wiring.

Proposed test hardening
 #[test]
 fn test_record_transaction_submitted() {
     let mut registry = Registry::default();
     let metrics = Metrics::register(&mut registry);

     metrics.record(&EvmEvent::TransactionSubmitted { chain_id: 100 });
+    let labels = ChainLabels { chain_id: 100 };
+    assert_eq!(metrics.transactions_total.get_or_create(&labels).get(), 1);
 }

 #[test]
 fn test_record_wallet_balance() {
     let mut registry = Registry::default();
     let metrics = Metrics::register(&mut registry);

     // 1 ETH in wei
     metrics.record(&EvmEvent::WalletBalanceUpdated {
         chain_id: 100,
         balance_wei: 1_000_000_000_000_000_000,
     });
+    let labels = ChainLabels { chain_id: 100 };
+    assert_eq!(metrics.wallet_balance_gwei.get_or_create(&labels).get(), 1_000_000_000);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@anchor-evm/src/metrics.rs` around lines 270 - 327, The tests currently call
Metrics::register and Metrics::record but don't assert metrics changed; add
assertions after each metrics.record(...) to validate the expected metric
families and values. For each test: call Registry::gather() (or
Registry::get_metric_family_by_name) to find the specific metric names produced
by Metrics (e.g., "transactions_submitted_total",
"transactions_succeeded_total", "transaction_duration_seconds",
"transaction_confirmation_seconds", "transaction_gas_cost_wei",
"transactions_failed_total", "retry_attempts_total", "wallet_balance_wei"),
assert the counter/gauge/histogram value incremented or set and that labels like
chain_id (and error_type for TransactionFailed) match the event, and for
TransactionSucceeded assert histogram buckets/sum/count reflect the provided
durations and gas_cost_wei; update tests for TransactionSubmitted,
TransactionSucceeded, TransactionFailed, RetryAttempt, and WalletBalanceUpdated
accordingly using the same Metrics::register and Metrics::record calls to locate
metrics via those metric names and label checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@anchor-service/src/anchor_batch.rs`:
- Around line 216-223: The dedup step builds anchor_requests from raw_requests
but does not guarantee the largest resume_token is at the tail, which can
regress the high-water-mark; after creating anchor_requests (the
Vec<AnchorRequest> produced from IndexMap::from_iter) sort it by each request's
resume_token (ascending) so the greatest resume_token is last before downstream
processing and before computing requests_after_dedup; locate the anchor_requests
Vec, call a stable sort (e.g. sort_by_key or sort_by on request.resume_token) to
enforce monotonicity and then continue with requests_after_dedup and subsequent
HWM update.

---

Duplicate comments:
In `@anchor-evm/src/evm_transaction_manager.rs`:
- Around line 242-243: The timer total_start/transaction_duration currently
begins before the retry loop, so logged transaction_duration includes
retry/backoff time; either move the Instant::now() to just before each actual
submission call (i.e., start timing immediately before the call that
sends/submits the transaction) so transaction_duration measures
submission→confirmation, or else rename/document the variable to reflect
end-to-end operation duration; update all occurrences that use
total_start/transaction_duration (the variables named total_start and
transaction_duration in evm_transaction_manager.rs and the places that log them
around the submission/confirmation flow) to follow the chosen approach for
consistency.

---

Nitpick comments:
In `@anchor-evm/src/metrics.rs`:
- Around line 270-327: The tests currently call Metrics::register and
Metrics::record but don't assert metrics changed; add assertions after each
metrics.record(...) to validate the expected metric families and values. For
each test: call Registry::gather() (or Registry::get_metric_family_by_name) to
find the specific metric names produced by Metrics (e.g.,
"transactions_submitted_total", "transactions_succeeded_total",
"transaction_duration_seconds", "transaction_confirmation_seconds",
"transaction_gas_cost_wei", "transactions_failed_total", "retry_attempts_total",
"wallet_balance_wei"), assert the counter/gauge/histogram value incremented or
set and that labels like chain_id (and error_type for TransactionFailed) match
the event, and for TransactionSucceeded assert histogram buckets/sum/count
reflect the provided durations and gas_cost_wei; update tests for
TransactionSubmitted, TransactionSucceeded, TransactionFailed, RetryAttempt, and
WalletBalanceUpdated accordingly using the same Metrics::register and
Metrics::record calls to locate metrics via those metric names and label checks.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ddf5c0 and 400dd8b.

📒 Files selected for processing (3)
  • anchor-evm/src/evm_transaction_manager.rs
  • anchor-evm/src/metrics.rs
  • anchor-service/src/anchor_batch.rs

Comment on lines +216 to +223
let anchor_requests: Vec<AnchorRequest> = IndexMap::<Cid, AnchorRequest>::from_iter(
raw_requests
.into_iter()
.map(|request| (request.id, request)),
)
.into_values()
.collect();
let requests_after_dedup = anchor_requests.len() as u64;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ensure deduped requests preserve high-water-mark monotonicity.

At Line 216, dedup can leave anchor_requests in an order where the greatest resume_token is not last. Since downstream high-water-mark update is derived from tail position, this can regress HWM and cause reprocessing of already-anchored events.

🐛 Proposed fix
-        let anchor_requests: Vec<AnchorRequest> = IndexMap::<Cid, AnchorRequest>::from_iter(
+        let mut anchor_requests: Vec<AnchorRequest> = IndexMap::<Cid, AnchorRequest>::from_iter(
             raw_requests
                 .into_iter()
                 .map(|request| (request.id, request)),
         )
         .into_values()
         .collect();
+        anchor_requests.sort_by_key(|request| request.resume_token);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@anchor-service/src/anchor_batch.rs` around lines 216 - 223, The dedup step
builds anchor_requests from raw_requests but does not guarantee the largest
resume_token is at the tail, which can regress the high-water-mark; after
creating anchor_requests (the Vec<AnchorRequest> produced from
IndexMap::from_iter) sort it by each request's resume_token (ascending) so the
greatest resume_token is last before downstream processing and before computing
requests_after_dedup; locate the anchor_requests Vec, call a stable sort (e.g.
sort_by_key or sort_by on request.resume_token) to enforce monotonicity and then
continue with requests_after_dedup and subsequent HWM update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant