Skip to content

feat: proof of gossip#217

Open
camembera wants to merge 25 commits intomainfrom
feat/proof-of-gossip
Open

feat: proof of gossip#217
camembera wants to merge 25 commits intomainfrom
feat/proof-of-gossip

Conversation

@camembera
Copy link
Collaborator

@camembera camembera commented Feb 24, 2026

Proof of Gossip

A peer can claim to support eth/68 and accept your txs without relaying them. The only reliable signal is whether a canary transaction sent to that peer is eventually mined.

Configuration

Flag Default Description
--pog.private-key-file (none) Path to a file containing the PoG private key hex. If unset, PoG is disabled.
--pog.timeout 120 Seconds to wait for canary receipt before timing out and penalizing the peer.
--pog.reputation-penalty -25600 Reputation change applied on timeout (negative or positive is ok).

Testing updates

Added/updated tests for:

  • CLI defaults and parsing for --pog.private-key-file
  • Rejection of legacy --pog.private-key (explicit breaking behavior)
  • PoG key-file loading with:
    • valid 0x-prefixed key
    • valid non-prefixed key
    • surrounding whitespace/newlines
    • missing file error
    • empty file error
    • malformed key error

Validation run

Executed successfully:

  • cargo test args::tests --lib
  • cargo test proof_of_gossip::tests --lib
  • rustup run nightly cargo clippy --all-targets --all-features -- -D warnings
  • /Users/camembearbera/src/cargo-home/bin/dprint check
  • /Users/camembearbera/src/cargo-home/bin/cargo-machete
  • /Users/camembearbera/src/cargo-home/bin/cargo-deny check
  • RUSTDOCFLAGS="-D warnings" cargo doc --all --no-deps --document-private-items
  • PATH="/Users/camembearbera/src/cargo-home/bin:$PATH" cargo nextest run --locked (137 passed)

Note on make pr in this environment: the Makefile invokes cargo +nightly ..., while this shell's cargo frontend does not support +toolchain forwarding. The equivalent checks above were run directly and passed.

Summary by CodeRabbit

  • New Features

    • Integrated a Proof-of-Gossip (PoG) service for peer reputation, on‑chain verification, canary transactions, and network health checks.
    • Added CLI options to configure PoG: private key file, timeout, and reputation penalty.
    • Persistent peer state and metrics hooks for visibility.
  • Chores

    • Updated dependencies and developer tooling to support the new service and persistence layer.
  • Tests

    • Added comprehensive tests covering CLI parsing, PoG workflows, persistence, and concurrency.

@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

📝 Walkthrough

Walkthrough

Adds a Proof of Gossip (PoG) service: new dependencies, a Clap-based CLI args struct, public modules exported, integration into node startup to optionally instantiate and run the PoG service, and a large new PoG module implementing network/provider traits, persistence, signing, lifecycle logic, and tests.

Changes

Cohort / File(s) Summary
Dependency Additions
Cargo.toml
Added runtime deps: alloy-signer, rand, rusqlite (bundled); reth git crates pinned to v1.9.3 (reth-eth-wire-types, reth-metrics, reth-network, reth-network-api); added tempfile = "3" to dev-dependencies.
CLI Arguments
src/args.rs
New public BerachainArgs Clap struct with pog_private_key_file, pog_timeout, pog_reputation_penalty, default constants, and unit tests covering parsing and error cases.
Module Exports
src/lib.rs
Exported new public modules: args and proof_of_gossip.
Node Integration
src/main.rs
Replaced NoArgs with BerachainArgs; thread args into bootstrap; destructure node handle; conditionally construct/spawn new_pog_service using node network/provider/chain and datadir.
Proof of Gossip Service
src/proof_of_gossip.rs
Large new module (~+1326 lines): PogError/PogResult, NetworkOps & PogProvider traits and impls, ProofOfGossipService with persistent SQLite state, signer loading, canary tx creation, send/confirm/timeout/reconcile lifecycle, funding/backoff, metrics hooks, and comprehensive tests.

Sequence Diagram(s)

sequenceDiagram
    participant Node as "Berachain Node"
    participant PoG as "PoG Service"
    participant Network as "Network Handle"
    participant Provider as "Blockchain Provider"
    participant DB as "SQLite DB"
    participant Peer as "Network Peer"

    Node->>PoG: Initialize with network, provider, chain_id, datadir, args
    PoG->>DB: Load persisted peer state & failure counts
    PoG->>Provider: Query account nonce, balance, latest base fee
    PoG->>PoG: Load signer from private key

    loop Periodic Tick
        PoG->>Provider: Check syncing status
        alt Not syncing
            PoG->>Network: Get all peers
            Network-->>PoG: Return peer list
            PoG->>PoG: Create & sign canary tx
            PoG->>Network: Send canary to peer
            Network->>Peer: Deliver canary
            PoG->>DB: Persist active canary (tx_hash, peer_id, sent_at)
        end

        PoG->>Provider: Check for transaction receipt
        alt Receipt found
            PoG->>DB: Mark peer confirmed, persist result
            PoG->>Network: Adjust peer reputation (positive)
            PoG->>PoG: Clear active canary
        else No receipt / timeout
            PoG->>PoG: Mark timed-out canary
            PoG->>Network: Adjust peer reputation (negative)
            PoG->>DB: Increment failure count, persist
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • rezzmah
  • calbera

Poem

🐰 I hopped into the code with cheer,
I signed small canaries and sent them near,
The DB keeps count of every tiny test,
Peers earn or lose their badges at my behest,
Gossip flutters softly across the peer.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.66% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: proof of gossip' clearly and concisely summarizes the main feature addition in the PR - the implementation of a Proof of Gossip service.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/proof-of-gossip

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@camembera camembera changed the title Feat/proof of gossip feat: proof of gossip Feb 24, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
src/args.rs (1)

35-45: Hard-coded test key duplicates the secret-scan issue already flagged.

Please align with the fix suggested for the PoG tests to avoid repeated secret-scan hits.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/args.rs` around lines 35 - 45, The test test_with_private_key in
src/args.rs uses a hard-coded real-looking private key string via
TestCli::parse_from and cli.args.pog_private_key, causing duplicate secret-scan
hits; change the test to avoid embedded secrets by replacing the literal with a
non-sensitive test value or a generated/fixture value: either read from a
test-only env var (e.g. TEST_POG_PRIVATE_KEY) or call a helper like
generate_test_private_key() that returns a benign, clearly non-secret hex
string, and update the assertion to compare against that same source instead of
the hard-coded key.
🧹 Nitpick comments (2)
src/args.rs (1)

5-13: Extract default values into named constants.

Keeps the CLI defaults self-documenting and aligns with the magic-number guideline.

Proposed change
 use clap::Args;
 
+const DEFAULT_POG_TIMEOUT_SECS: u64 = 120;
+const DEFAULT_POG_REPUTATION_PENALTY: i32 = -25600;
+
 #[derive(Debug, Clone, Default, Args)]
 #[command(next_help_heading = "Proof of Gossip")]
 pub struct BerachainArgs {
@@
-    #[arg(long = "pog.timeout", default_value_t = 120)]
+    #[arg(long = "pog.timeout", default_value_t = DEFAULT_POG_TIMEOUT_SECS)]
     pub pog_timeout: u64,
 
-    #[arg(long = "pog.reputation-penalty", default_value_t = -25600, allow_hyphen_values = true)]
+    #[arg(long = "pog.reputation-penalty", default_value_t = DEFAULT_POG_REPUTATION_PENALTY, allow_hyphen_values = true)]
     pub pog_reputation_penalty: i32,
 }
As per coding guidelines, "Extract magic numbers into documented constants".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/args.rs` around lines 5 - 13, Extract the literal defaults into clearly
named constants and reference those constants in the Arg attributes: introduce
documented constants (e.g., DEFAULT_POG_TIMEOUT and
DEFAULT_POG_REPUTATION_PENALTY) near the top of the module and replace the
default_value_t = 120 and default_value_t = -25600 in the BerachainArgs struct
(fields pog_timeout and pog_reputation_penalty) with those constants; ensure the
constants have descriptive names, appropriate visibility (pub if needed), and a
short doc comment so the CLI defaults are self-documenting and follow the
magic-number guideline.
src/proof_of_gossip.rs (1)

920-947: Extract test magic numbers into named constants.

This improves readability and aligns with the repo’s magic-number guideline.

Proposed change
 const ONE_BERA: U256 = U256::from_limbs([1_000_000_000_000_000_000, 0, 0, 0]);
+const TEST_CHAIN_ID: u64 = 80094;
+const TEST_TIMEOUT_SECS: u64 = 120;
+const TEST_REPUTATION_PENALTY: i32 = -25600;
@@
             signer,
-            chain_id: 80094,
+            chain_id: TEST_CHAIN_ID,
             db,
             confirmed_peers: HashSet::new(),
             failure_counts: HashMap::new(),
-            reputation_penalty: -25600,
+            reputation_penalty: TEST_REPUTATION_PENALTY,
             active: None,
             timed_out_canaries: HashMap::new(),
             nonce: 0,
-            timeout: Duration::from_secs(120),
+            timeout: Duration::from_secs(TEST_TIMEOUT_SECS),
             warned_syncing: false,
             funding_backoff: None,
             funding_backoff_secs: 0,
As per coding guidelines, "Extract magic numbers into documented constants".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proof_of_gossip.rs` around lines 920 - 947, Extract the hard-coded
numeric literals in make_service into named test constants and use them when
constructing the ProofOfGossipService to improve readability; specifically
create constants for the chain id (80094 -> e.g. TEST_CHAIN_ID), reputation
penalty (-25600 -> e.g. TEST_REPUTATION_PENALTY), timeout seconds (120 -> e.g.
TEST_TIMEOUT_SECS) and funding_backoff_secs (0 -> e.g.
TEST_FUNDING_BACKOFF_SECS), then replace the inline literals in the make_service
function when assigning ProofOfGossipService.chain_id, .reputation_penalty,
.timeout (convert TEST_TIMEOUT_SECS into Duration::from_secs), and
.funding_backoff_secs; keep the constant names descriptive and document them
with a short comment near their declarations so other tests can reuse them.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/proof_of_gossip.rs`:
- Around line 545-574: The test test_canary_tx_construction currently embeds a
literal private_key string; replace that with a generated ephemeral key or test
fixture so no raw secret appears in source: create a temporary signer via a
secure test helper (e.g., generate a random/private-key via the library API or a
test-only PrivateKey::random()/from_seed helper) and use that signer in place of
the private_key literal before calling create_canary_tx; update references to
PrivateKeySigner, signer, and signer.address() accordingly so the test asserts
remain identical but no hard-coded key is committed.
- Line 6: Replace the generic eyre::Result alias with a PoG-specific error type:
define an enum (e.g., PogError) listing the possible PoG failure variants,
implement Display and std::error::Error for it, and add From conversions for any
underlying error types you currently bubble as eyre::Report so existing error
propagation still works; create a type alias PogResult<T> = Result<T, PogError>
and update functions in proof_of_gossip.rs that currently use eyre::Result to
return PogResult<T> (refer to the module-level use of eyre::Result and functions
like any public APIs in proof_of_gossip.rs to locate spots to change).
- Around line 777-778: Remove the banner comment "// --- Mock types for tick()
testing ---" from the test area in proof_of_gossip.rs; locate the comment text
and delete it so there are no inline comments in the tests per repo guidelines,
leaving the mock type declarations and test code (e.g., the mock types and any
test functions) unchanged.
- Around line 121-137: Remove the inline doc comments from the PoGMetrics struct
fields so the code complies with the "no comments" rule: edit the struct
definition for PoGMetrics and delete the leading triple-slash comments above
fields like canaries_sent_total, canary_confirmed_total, canary_timeout_total,
canary_late_confirmed_total, penalties_total, bans_total, and inflight_canaries;
leave the metrics attributes, field names, and types unchanged and, if
necessary, move any required rationale to external documentation instead of
in-code comments.

---

Duplicate comments:
In `@src/args.rs`:
- Around line 35-45: The test test_with_private_key in src/args.rs uses a
hard-coded real-looking private key string via TestCli::parse_from and
cli.args.pog_private_key, causing duplicate secret-scan hits; change the test to
avoid embedded secrets by replacing the literal with a non-sensitive test value
or a generated/fixture value: either read from a test-only env var (e.g.
TEST_POG_PRIVATE_KEY) or call a helper like generate_test_private_key() that
returns a benign, clearly non-secret hex string, and update the assertion to
compare against that same source instead of the hard-coded key.

---

Nitpick comments:
In `@src/args.rs`:
- Around line 5-13: Extract the literal defaults into clearly named constants
and reference those constants in the Arg attributes: introduce documented
constants (e.g., DEFAULT_POG_TIMEOUT and DEFAULT_POG_REPUTATION_PENALTY) near
the top of the module and replace the default_value_t = 120 and default_value_t
= -25600 in the BerachainArgs struct (fields pog_timeout and
pog_reputation_penalty) with those constants; ensure the constants have
descriptive names, appropriate visibility (pub if needed), and a short doc
comment so the CLI defaults are self-documenting and follow the magic-number
guideline.

In `@src/proof_of_gossip.rs`:
- Around line 920-947: Extract the hard-coded numeric literals in make_service
into named test constants and use them when constructing the
ProofOfGossipService to improve readability; specifically create constants for
the chain id (80094 -> e.g. TEST_CHAIN_ID), reputation penalty (-25600 -> e.g.
TEST_REPUTATION_PENALTY), timeout seconds (120 -> e.g. TEST_TIMEOUT_SECS) and
funding_backoff_secs (0 -> e.g. TEST_FUNDING_BACKOFF_SECS), then replace the
inline literals in the make_service function when assigning
ProofOfGossipService.chain_id, .reputation_penalty, .timeout (convert
TEST_TIMEOUT_SECS into Duration::from_secs), and .funding_backoff_secs; keep the
constant names descriptive and document them with a short comment near their
declarations so other tests can reuse them.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e37ffc and cfcb7d0.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • Cargo.toml
  • src/args.rs
  • src/lib.rs
  • src/main.rs
  • src/proof_of_gossip.rs

@camembera camembera marked this pull request as ready for review February 24, 2026 12:01
Copilot AI review requested due to automatic review settings February 24, 2026 12:01
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/proof_of_gossip.rs (1)

577-578: Test signer still derives from a well-known constant (private key = 1).

This was flagged in a prior review. While the dynamic format! avoids a raw literal, the key is still deterministic and trivially known. Consider using PrivateKeySigner::random() for tests that don't need a stable address, or document why a fixed key is required here (e.g., for reproducible recover_signer assertions).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proof_of_gossip.rs` around lines 577 - 578, The test helper test_signer
currently derives a trivially-known private key via format!("0x{:064x}",
1u64).parse().unwrap(); replace this with a non-deterministic key for tests that
do not require a fixed address by calling PrivateKeySigner::random() (or another
secure generator) in test_signer, or if a stable key is required for assertions
(e.g., in recover_signer tests) add an explanatory comment above test_signer
documenting why the fixed key is necessary and reference the dependent
assertions (such as recover_signer) so reviewers understand the intent.
🧹 Nitpick comments (3)
src/args.rs (1)

9-10: Consider accepting the private key via environment variable.

Passing secrets as CLI flags exposes them in /proc/*/cmdline and process listings. Clap supports env = "POG_PRIVATE_KEY" to allow an env-var fallback, which is safer for production deployments.

Suggested change
-    #[arg(long = "pog.private-key")]
+    #[arg(long = "pog.private-key", env = "POG_PRIVATE_KEY")]
     pub pog_private_key: Option<String>,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/args.rs` around lines 9 - 10, The pog_private_key CLI flag exposes
secrets in process listings; update the clap attribute on the pog_private_key
field to accept an environment fallback by adding env = "POG_PRIVATE_KEY" to the
#[arg(...)] attribute so clap will read the value from the POG_PRIVATE_KEY
environment variable when the flag is not provided; modify the attribute on pub
pog_private_key: Option<String> (the #[arg(long = "pog.private-key")] line) to
include env = "POG_PRIVATE_KEY".
src/proof_of_gossip.rs (2)

38-57: PogError is a thin pass-through; consider adding domain variants.

Currently PogError only wraps eyre::Report, making it effectively equivalent to eyre::Result. Adding variants like Db(rusqlite::Error), Signer(alloy_signer::Error), Provider(String) would make error handling more actionable for callers and improve diagnostics. This can be deferred if the current granularity is sufficient for the initial launch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proof_of_gossip.rs` around lines 38 - 57, PogError currently only wraps
eyre::Report; add domain-specific variants (e.g., Db(rusqlite::Error),
Signer(alloy_signer::Error), Provider(String), etc.) to make errors actionable,
update the PogResult alias to continue using PogError, and adjust the
IntoPogResult impl/into_pog conversion logic so callers converting other error
types map them into the appropriate PogError variant (or fallback to Report)
instead of always wrapping into a generic Report; ensure all places constructing
PogError (and pattern matches) are updated to handle the new variants.

59-65: NetworkOps trait returns impl Future — not object-safe.

Using impl Future in trait methods (via RPITIT) prevents dyn NetworkOps usage. This is fine as long as generics-only dispatch is intended, but it's worth confirming this won't limit future extensibility (e.g., if you need to store a Box<dyn NetworkOps>).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proof_of_gossip.rs` around lines 59 - 65, The trait NetworkOps is not
object-safe because get_all_peers returns impl Future; change its signature to
return a boxed future instead so dyn NetworkOps can be used: replace the
get_all_peers return type with a Pin<Box<dyn Future<Output =
PogResult<Vec<PeerInfo>>> + Send>> (or use BoxFuture from futures crate) and
update any callers/implementations of get_all_peers to return that boxed future;
reference the NetworkOps trait and the get_all_peers method when applying the
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/proof_of_gossip.rs`:
- Around line 316-332: After calling reconcile_late_confirmations() inside
tick(), ensure you verify that the currently tracked canary (self.active) still
has a valid nonce by comparing its nonce against the latest on-chain nonce; if
the on-chain nonce has advanced beyond self.active.nonce, drop/discard
self.active (set self.active = None) and skip penalizing the peer or treating it
as a timed-out failure (i.e., return early or continue without changing peer
scores). Apply the same nonce-validity check in the analogous block around the
code at lines ~476-505 so any in-flight canary whose nonce was invalidated by a
late confirmation is quietly discarded without penalty.
- Line 285: The use of args.pog_reputation_penalty.abs() can panic for i32::MIN;
change the assignment of reputation_penalty to avoid abs() by computing the
negative magnitude safely: if args.pog_reputation_penalty > 0 set
reputation_penalty to -args.pog_reputation_penalty, otherwise keep
args.pog_reputation_penalty as-is (this preserves i32::MIN without calling abs).
Update the code that sets the reputation_penalty field (referencing
reputation_penalty and args.pog_reputation_penalty) to use this conditional
logic.

---

Duplicate comments:
In `@src/proof_of_gossip.rs`:
- Around line 577-578: The test helper test_signer currently derives a
trivially-known private key via format!("0x{:064x}", 1u64).parse().unwrap();
replace this with a non-deterministic key for tests that do not require a fixed
address by calling PrivateKeySigner::random() (or another secure generator) in
test_signer, or if a stable key is required for assertions (e.g., in
recover_signer tests) add an explanatory comment above test_signer documenting
why the fixed key is necessary and reference the dependent assertions (such as
recover_signer) so reviewers understand the intent.

---

Nitpick comments:
In `@src/args.rs`:
- Around line 9-10: The pog_private_key CLI flag exposes secrets in process
listings; update the clap attribute on the pog_private_key field to accept an
environment fallback by adding env = "POG_PRIVATE_KEY" to the #[arg(...)]
attribute so clap will read the value from the POG_PRIVATE_KEY environment
variable when the flag is not provided; modify the attribute on pub
pog_private_key: Option<String> (the #[arg(long = "pog.private-key")] line) to
include env = "POG_PRIVATE_KEY".

In `@src/proof_of_gossip.rs`:
- Around line 38-57: PogError currently only wraps eyre::Report; add
domain-specific variants (e.g., Db(rusqlite::Error),
Signer(alloy_signer::Error), Provider(String), etc.) to make errors actionable,
update the PogResult alias to continue using PogError, and adjust the
IntoPogResult impl/into_pog conversion logic so callers converting other error
types map them into the appropriate PogError variant (or fallback to Report)
instead of always wrapping into a generic Report; ensure all places constructing
PogError (and pattern matches) are updated to handle the new variants.
- Around line 59-65: The trait NetworkOps is not object-safe because
get_all_peers returns impl Future; change its signature to return a boxed future
instead so dyn NetworkOps can be used: replace the get_all_peers return type
with a Pin<Box<dyn Future<Output = PogResult<Vec<PeerInfo>>> + Send>> (or use
BoxFuture from futures crate) and update any callers/implementations of
get_all_peers to return that boxed future; reference the NetworkOps trait and
the get_all_peers method when applying the change.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e37ffc and b1e335f.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • Cargo.toml
  • src/args.rs
  • src/lib.rs
  • src/main.rs
  • src/proof_of_gossip.rs

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a Proof of Gossip (PoG) peer liveness monitoring system for the Berachain node. The feature validates that peers properly relay transactions by sending them "canary" transactions and monitoring whether they get mined on-chain. The implementation is opt-in via the --pog.private-key flag and integrates seamlessly with the node's lifecycle management.

Changes:

  • Adds background service that periodically tests untested peers by sending minimal EIP-1559 self-transfer transactions
  • Implements SQLite persistence for test results with automatic reload on restart to maintain peer state
  • Includes exponential backoff for underfunded wallets, sync gating, late confirmation reconciliation, and Prometheus metrics

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/proof_of_gossip.rs Core PoG service implementation with traits, database persistence, canary tx creation, and comprehensive tests
src/args.rs CLI argument definitions for PoG configuration (private key, timeout, reputation penalty)
src/main.rs Integration of PoG service into node startup with graceful shutdown support
src/lib.rs Module export for args and proof_of_gossip modules
Cargo.toml Added dependencies: rand, rusqlite, alloy-signer, reth-metrics, reth-network, reth-network-api, reth-eth-wire-types, tempfile
Cargo.lock Dependency lockfile updates

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Replace PoG inline key input with --pog.private-key-file and load signer material from disk with trim/validation errors. Add CLI and PoG tests that cover breaking-flag behavior plus valid and invalid key-file scenarios.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
src/proof_of_gossip.rs (3)

341-342: ⚠️ Potential issue | 🟠 Major

Late confirmation can silently invalidate the active canary's nonce, causing a false peer penalty — still unresolved.

When a timed-out canary A is later confirmed (mined), its nonce N is consumed on-chain. If canary B was already sent with the same nonce N (because refresh_nonce() read N before A was mined), B can never be included. B will time out and its peer will be penalized despite having relayed correctly.

reconcile_late_confirmations() at line 342 detects the late confirmation but doesn't check whether the active canary's nonce is now stale. Add a nonce-validity guard immediately after the reconcile call:

🛠️ Proposed fix
     async fn tick(&mut self) -> PogResult<()> {
         self.reconcile_late_confirmations()?;
 
+        if self.active.is_some() {
+            let on_chain_nonce = self
+                .provider
+                .account_nonce(&self.signer.address())?
+                .unwrap_or(0);
+            if on_chain_nonce > self.nonce {
+                info!(
+                    target: "bera_reth::pog",
+                    old_nonce = self.nonce,
+                    new_nonce = on_chain_nonce,
+                    "Nonce advanced by late confirmation; discarding in-flight canary without penalty"
+                );
+                self.active = None;
+                self.nonce = on_chain_nonce;
+                pog_metrics().inflight_canaries.set(0.0);
+            }
+        }
+
         if self.network.is_syncing() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proof_of_gossip.rs` around lines 341 - 342, After calling
reconcile_late_confirmations() inside async fn tick(&mut self) -> PogResult<()>,
add a nonce-validity guard that checks whether the currently tracked active
canary's nonce is still equal to the latest on-chain nonce read by
refresh_nonce(); if they differ mark the active canary as stale/cancel it (or
skip sending/penalizing) and refresh the cached nonce to avoid penalizing peers
for an unfulfillable canary. Specifically: after reconcile_late_confirmations()
returns, call the same refresh_nonce() path used elsewhere, compare its returned
nonce to the active canary's nonce (the value held in the struct that tick uses
to decide timeouts/sends), and bail out or update state when they mismatch so
subsequent timeout handling does not penalize correctly-behaving peers.

602-604: ⚠️ Potential issue | 🟠 Major

Hardcoded private key in test_signer() still trips secret scanners — still unresolved.

format!("0x{:064x}", 1u64) produces a well-formed 32-byte private key that Gitleaks flagged as a "Generic API Key" at the previous location. The key has been extracted into test_signer(), but the scanner concern remains.

For tests that don't require a specific address (e.g., test_canary_tx_construction, make_service), replace with PrivateKeySigner::random(). The API is stable: PrivateKeySigner::random() creates a new random keypair. For the test_load_pog_signer_* tests that cross-compare addresses, keep test_signer() but use a key constant that is clearly non-secret (e.g., a const TEST_KEY_HEX: &str = "..." with a comment-free, obviously dummy value).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proof_of_gossip.rs` around lines 602 - 604, Replace the hardcoded private
key generation in test_signer() that uses format!("0x{:064x}", 1u64) because
secret scanners still flag it; for tests that do not require a deterministic
address (e.g., test_canary_tx_construction, make_service) replace calls to
test_signer() with PrivateKeySigner::random(), and for the
test_load_pog_signer_* tests that must compare addresses keep using
test_signer() but change its implementation to parse a clearly non-secret
constant like const TEST_KEY_HEX: &str = "DEADBEEF..."; (an obviously dummy hex
string) and have test_signer() return TEST_KEY_HEX.parse().unwrap(); ensure the
dummy const is plainly non-secret and has no secret-like formatting so scanners
ignore it.

310-310: ⚠️ Potential issue | 🔴 Critical

i32::MIN.abs() panics in debug / wraps silently in release — still unresolved.

If --pog.reputation-penalty is passed as i32::MIN (-2147483648), .abs() overflows i32 in debug (panic) and silently wraps back to i32::MIN in release; the subsequent negation also overflows.

🛠️ Proposed fix
-            reputation_penalty: -(args.pog_reputation_penalty.abs()),
+            reputation_penalty: if args.pog_reputation_penalty > 0 {
+                -args.pog_reputation_penalty
+            } else {
+                args.pog_reputation_penalty
+            },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proof_of_gossip.rs` at line 310, The expression reputation_penalty:
-(args.pog_reputation_penalty.abs()) can overflow for i32::MIN; replace the use
of .abs() on an i32 with a safe approach by promoting
args.pog_reputation_penalty to a wider integer (e.g., i64), take the absolute
value there, apply the negation, then cast back to i32 (or clamp if needed) so
that args.pog_reputation_penalty and the reputation_penalty field are computed
without risking i32 overflow.
🧹 Nitpick comments (1)
src/proof_of_gossip.rs (1)

429-431: Consider filtering eligible peers by eth_version to match the PoG objective.

The service targets peers that claim eth/68 support but don't relay, yet the eligible filter only excludes confirmed peers. Peers on earlier protocol versions (eth/66, eth/67) would be penalised for not forwarding transactions they're not expected to forward.

🛠️ Suggested change
+        use reth_eth_wire_types::EthVersion;
         let eligible: Vec<_> = all_peers
             .iter()
-            .filter(|p| !self.confirmed_peers.contains(&p.remote_id))
+            .filter(|p| {
+                p.eth_version >= EthVersion::Eth68
+                    && !self.confirmed_peers.contains(&p.remote_id)
+            })
             .collect();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proof_of_gossip.rs` around lines 429 - 431, The eligible-peers filter
currently only excludes confirmed peers but should also restrict to peers that
advertise the target protocol (eth/68); update the iterator that builds eligible
(the let eligible: Vec<_> = all_peers.iter().filter(|p| ... ).collect();) to add
a check on the peer's protocol field (e.g., p.eth_version or p.protocol_version)
so it only includes peers that claim eth/68 (for example add && p.eth_version ==
68 or the appropriate predicate for supporting 68) in addition to
!self.confirmed_peers.contains(&p.remote_id).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/args.rs`:
- Around line 91-92: In the test_rejects_old_private_key_flag test, replace the
key-shaped formatted value currently created by let key = format!("0x{:064x}",
1u64); with a simple non-key-shaped literal (e.g., "not-a-key" or "invalidkey")
so clap will still reject the flag but no secret-shaped string is present;
update the call to TestCli::try_parse_from(["test", "--pog.private-key", &key])
accordingly (referencing TestCli::try_parse_from and
test_rejects_old_private_key_flag to locate the code).

In `@src/proof_of_gossip.rs`:
- Around line 462-463: The call to latest_base_fee() inside check_funding
currently uses unwrap_or(CANARY_PRIORITY_FEE_WEI) which silently hides errors
and can underestimate min_balance; change check_funding to handle the error
instead of defaulting: call self.provider.latest_base_fee() and if it Errs,
return an Err (or propagate the error) from check_funding (or at minimum log the
error and treat it as a fatal/unknown state), so base_fee and derived max_fee
are computed only from a successful value; update check_funding's
signature/return to propagate the provider error and reference
latest_base_fee(), base_fee, max_fee, CANARY_PRIORITY_FEE_WEI,
MAX_FEE_BUFFER_MULTIPLIER and min_balance when making the change.

---

Duplicate comments:
In `@src/proof_of_gossip.rs`:
- Around line 341-342: After calling reconcile_late_confirmations() inside async
fn tick(&mut self) -> PogResult<()>, add a nonce-validity guard that checks
whether the currently tracked active canary's nonce is still equal to the latest
on-chain nonce read by refresh_nonce(); if they differ mark the active canary as
stale/cancel it (or skip sending/penalizing) and refresh the cached nonce to
avoid penalizing peers for an unfulfillable canary. Specifically: after
reconcile_late_confirmations() returns, call the same refresh_nonce() path used
elsewhere, compare its returned nonce to the active canary's nonce (the value
held in the struct that tick uses to decide timeouts/sends), and bail out or
update state when they mismatch so subsequent timeout handling does not penalize
correctly-behaving peers.
- Around line 602-604: Replace the hardcoded private key generation in
test_signer() that uses format!("0x{:064x}", 1u64) because secret scanners still
flag it; for tests that do not require a deterministic address (e.g.,
test_canary_tx_construction, make_service) replace calls to test_signer() with
PrivateKeySigner::random(), and for the test_load_pog_signer_* tests that must
compare addresses keep using test_signer() but change its implementation to
parse a clearly non-secret constant like const TEST_KEY_HEX: &str =
"DEADBEEF..."; (an obviously dummy hex string) and have test_signer() return
TEST_KEY_HEX.parse().unwrap(); ensure the dummy const is plainly non-secret and
has no secret-like formatting so scanners ignore it.
- Line 310: The expression reputation_penalty:
-(args.pog_reputation_penalty.abs()) can overflow for i32::MIN; replace the use
of .abs() on an i32 with a safe approach by promoting
args.pog_reputation_penalty to a wider integer (e.g., i64), take the absolute
value there, apply the negation, then cast back to i32 (or clamp if needed) so
that args.pog_reputation_penalty and the reputation_penalty field are computed
without risking i32 overflow.

---

Nitpick comments:
In `@src/proof_of_gossip.rs`:
- Around line 429-431: The eligible-peers filter currently only excludes
confirmed peers but should also restrict to peers that advertise the target
protocol (eth/68); update the iterator that builds eligible (the let eligible:
Vec<_> = all_peers.iter().filter(|p| ... ).collect();) to add a check on the
peer's protocol field (e.g., p.eth_version or p.protocol_version) so it only
includes peers that claim eth/68 (for example add && p.eth_version == 68 or the
appropriate predicate for supporting 68) in addition to
!self.confirmed_peers.contains(&p.remote_id).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1e335f and 5f53b4f.

📒 Files selected for processing (2)
  • src/args.rs
  • src/proof_of_gossip.rs

Handle i32::MIN-safe penalty normalization, drop invalidated in-flight canaries when nonce advances, increment local nonce after sends, and align base-fee fallback behavior across funding and send paths with targeted regression tests.
Align test imports in src/args.rs with rustfmt output used by CI.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/proof_of_gossip.rs (2)

405-407: failure_count is tracked but never drives any policy decision.

The value is incremented, logged, and persisted — but reputation_penalty is always a flat constant regardless of count. Either use it (e.g., progressive penalty tiers) or simplify failure_counts to a HashSet<PeerId> since only presence matters currently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proof_of_gossip.rs` around lines 405 - 407, The code updates and stores
failure_counts (see self.failure_counts, failure_count) but reputation_penalty
is a constant and never varies by count; either remove the unused counting and
replace failure_counts with a HashSet<PeerId> (only track presence when a peer
has failed) or make reputation_penalty scale with the stored count (e.g.,
compute penalty = base_penalty * failure_count or use tiered penalties) and
apply that computed value where reputation_penalty is currently used; update any
persistence and logging to reflect the simplified presence-only set or the new
scaled penalty calculation accordingly.

604-616: new_pog_service is a trivial pass-through of ProofOfGossipService::new.

Consider removing it and having callers use ProofOfGossipService::new directly to avoid two public entry points with identical semantics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proof_of_gossip.rs` around lines 604 - 616, The function new_pog_service
is a trivial passthrough to ProofOfGossipService::new and creates a redundant
public API; remove new_pog_service and update all callers to call
ProofOfGossipService::new(network, provider, chain_id, datadir, args) directly
(or if you must keep a compatibility shim, make new_pog_service private or add a
deprecation comment). Search for references to new_pog_service and replace them
with direct calls to ProofOfGossipService::new, then run cargo build/tests to
ensure no call sites remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/proof_of_gossip.rs`:
- Around line 527-550: The function drop_invalidated_active_canary clears active
when provider.account_nonce(&self.signer.address())? returns a higher nonce, but
this also fires when the active canary's own tx was confirmed (so on_chain_nonce
> active.nonce) and causes confirmed state to be lost; change the logic in
drop_invalidated_active_canary to first check whether the canary's receipt
already exists (call the same receipt_exists(...) path used in tick or otherwise
consult the provider for the active.tx_hash) and if the receipt exists, do not
clear self.active (let tick() call persist_result("confirmed") and insert into
confirmed_peers); only clear and update self.nonce and metrics when the nonce
advanced for reasons other than the active canary’s own confirmation.
- Around line 1205-1221: The test currently leaves MockProvider nonce at 0 which
masks a bug in drop_invalidated_active_canary; after calling
MockProvider.add_receipt(tx_hash) advance the nonce (MockProvider.set_nonce(1))
so the behavior matches production (account_nonce() increments) and update the
assertion accordingly: after service.tick() expect service.active to be None but
that the peer was NOT recorded in service.confirmed_peers (i.e.,
assert!(!service.confirmed_peers.contains(&peer))). Ensure you reference
MockProvider.add_receipt, MockProvider.set_nonce, service.tick, service.active
and service.confirmed_peers when changing the test and keep
drop_invalidated_active_canary logic in mind.

---

Nitpick comments:
In `@src/proof_of_gossip.rs`:
- Around line 405-407: The code updates and stores failure_counts (see
self.failure_counts, failure_count) but reputation_penalty is a constant and
never varies by count; either remove the unused counting and replace
failure_counts with a HashSet<PeerId> (only track presence when a peer has
failed) or make reputation_penalty scale with the stored count (e.g., compute
penalty = base_penalty * failure_count or use tiered penalties) and apply that
computed value where reputation_penalty is currently used; update any
persistence and logging to reflect the simplified presence-only set or the new
scaled penalty calculation accordingly.
- Around line 604-616: The function new_pog_service is a trivial passthrough to
ProofOfGossipService::new and creates a redundant public API; remove
new_pog_service and update all callers to call
ProofOfGossipService::new(network, provider, chain_id, datadir, args) directly
(or if you must keep a compatibility shim, make new_pog_service private or add a
deprecation comment). Search for references to new_pog_service and replace them
with direct calls to ProofOfGossipService::new, then run cargo build/tests to
ensure no call sites remain.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f53b4f and 00c9f34.

📒 Files selected for processing (1)
  • src/proof_of_gossip.rs

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/args.rs (1)

90-91: ⚠️ Potential issue | 🟡 Minor

Avoid key-shaped literals in the legacy-flag rejection test.

Clap rejects --pog.private-key before parsing the value, so the key-shaped string is unnecessary and may trigger secret scanners. Use a non-key-shaped literal.
As per coding guidelines "Extract magic numbers into documented constants".

🛠️ Proposed fix
-        let key = format!("0x{:064x}", 1u64);
-        let parsed = TestCli::try_parse_from(["test", "--pog.private-key", &key]);
+        let parsed = TestCli::try_parse_from(["test", "--pog.private-key", "dummy"]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/args.rs` around lines 90 - 91, The test creates a key-shaped literal in
the variable key and passes it to TestCli::try_parse_from which is unnecessary
and may trigger secret scanners; change the test to use a clearly non-key-shaped
literal (e.g. "not-a-key-000") instead of the formatted hexadecimal, and extract
any repeated magic value into a documented constant (e.g.
TEST_PRIVATE_KEY_PLACEHOLDER) used when building the argument array before
calling TestCli::try_parse_from so the intent is clear and no real-looking key
appears in the test.
🧹 Nitpick comments (1)
src/args.rs (1)

4-5: Document the rationale for the default constants.

These defaults are magic numbers without documented rationale. Please add succinct “why” documentation for each default (e.g., chosen to match PoG canary SLA or protocol expectation). Use a doc attribute if you want to avoid inline comments.
As per coding guidelines "Extract magic numbers into documented constants" and "Write succinct documentation focusing on 'why' rather than 'what'".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/args.rs` around lines 4 - 5, The constants DEFAULT_POG_TIMEOUT_SECS and
DEFAULT_POG_REPUTATION_PENALTY are undocumented magic numbers; add succinct
rationale docs for each (using outer doc comments /// or #[doc = "..."])
explaining why these specific values were chosen (e.g., "120s chosen to match
PoG canary SLA/timeout margin" and "-25600 chosen to align with protocol
reputation scaling/penalty semantics") so readers understand the intent and
provenance of the defaults; attach the docs directly above the constant
declarations (DEFAULT_POG_TIMEOUT_SECS and DEFAULT_POG_REPUTATION_PENALTY).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/args.rs`:
- Around line 90-91: The test creates a key-shaped literal in the variable key
and passes it to TestCli::try_parse_from which is unnecessary and may trigger
secret scanners; change the test to use a clearly non-key-shaped literal (e.g.
"not-a-key-000") instead of the formatted hexadecimal, and extract any repeated
magic value into a documented constant (e.g. TEST_PRIVATE_KEY_PLACEHOLDER) used
when building the argument array before calling TestCli::try_parse_from so the
intent is clear and no real-looking key appears in the test.

---

Nitpick comments:
In `@src/args.rs`:
- Around line 4-5: The constants DEFAULT_POG_TIMEOUT_SECS and
DEFAULT_POG_REPUTATION_PENALTY are undocumented magic numbers; add succinct
rationale docs for each (using outer doc comments /// or #[doc = "..."])
explaining why these specific values were chosen (e.g., "120s chosen to match
PoG canary SLA/timeout margin" and "-25600 chosen to align with protocol
reputation scaling/penalty semantics") so readers understand the intent and
provenance of the defaults; attach the docs directly above the constant
declarations (DEFAULT_POG_TIMEOUT_SECS and DEFAULT_POG_REPUTATION_PENALTY).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00c9f34 and 6ca0435.

📒 Files selected for processing (1)
  • src/args.rs

Avoid dropping an active canary when nonce advances due to that canary's own confirmation, and update the confirmation test to model nonce advancement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants