fix(syncer): include in-flight and pending-cache work in PendingCount#3162
fix(syncer): include in-flight and pending-cache work in PendingCount#3162
Conversation
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
📝 WalkthroughWalkthroughThe changes implement in-flight event tracking in the Syncer struct alongside pending events in the cache, adding a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Claude finished @alpe's task in 3m 16s —— View job Code Review
SummaryThis is a small, focused fix that correctly addresses an under-counting bug in Correctness
The
The original
The new formula
Observations1. Non-atomic snapshot — theoretical narrow race // syncer.go:409
return len(s.heightInCh) + int(s.inFlight.Load()) + s.cache.PendingEventsCount()The three reads are not taken under a single lock. There is a tiny window where an event has been dequeued from In practice, the risk is negligible (the window is nanoseconds and the polling interval is much longer), but worth documenting. If robustness is desired later, one option is to increment 2. Since 3. Missing test for There is no unit test covering the updated 4. PR description mentions a The PR body states: "runCatchup lost the local <= highestSeen bound and now scans upward until ErrHeightFromFuture, then marks head reached." However, // da_follower.go:351-355
if highest > 0 && local > highest {
// Caught up.
f.headReached.Store(true)
return
}This description appears to be leftover from an earlier iteration of the PR. Updating the PR description to match the actual changes would help reviewers. VerdictThe core fix is correct and safe. The changes are minimal, well-scoped, and address the stated race. The main actionable items are: (1) add a regression test for |
* main: chore: re-add replaces for development (#3161)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3162 +/- ##
==========================================
- Coverage 60.20% 60.19% -0.02%
==========================================
Files 115 115
Lines 11896 11902 +6
==========================================
+ Hits 7162 7164 +2
- Misses 3923 3927 +4
Partials 811 811
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
block/internal/cache/manager.go (1)
61-61: Add a one-line doc for the new exported interface method.Line 61 introduces a new exported API method; adding method-level docs keeps API docs complete and consistent.
Suggested tweak
// Pending events syncing coordination GetNextPendingEvent(blockHeight uint64) *common.DAHeightEvent SetPendingEvent(blockHeight uint64, event *common.DAHeightEvent) + // PendingEventsCount returns the number of pending events currently cached by height. PendingEventsCount() intAs per coding guidelines "Document exported types and functions".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@block/internal/cache/manager.go` at line 61, The exported interface method PendingEventsCount on the manager interface is missing a one-line doc; add a concise method-level comment above PendingEventsCount() int describing what it returns (e.g., "PendingEventsCount returns the number of events currently queued for processing") so the exported API is documented and follows the project's "Document exported types and functions" guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@block/internal/syncing/syncer.go`:
- Around line 383-385: The check in waitForCatchup() that treats a single
PendingCount()==0 as terminal can be fooled by transient handoffs between the
buffered channel, inFlight (s.inFlight), and cache; update waitForCatchup() to
require stability before returning by sampling PendingCount() multiple times
(e.g., N consecutive zero samples spaced by the current poll interval) or add an
additional guard such as verifying inFlight==0 and the buffered channel is empty
after a short debounce window; reference PendingCount(), waitForCatchup(),
s.inFlight/Add and processHeightEvent() to locate where to implement the
repeated-sampling or extra-check logic so the syncer does not stop prematurely.
In `@CHANGELOG.md`:
- Around line 71-73: Fix the typo "accross" to "across" in the changelog entry
that begins "**BREAKING** Make pending events cache and tx cache fully
ephemeral..." (search for the exact word "accross" in that paragraph) so the
sentence reads "Persist across restart using store metadata.".
- Around line 316-318: Remove the stray orphan list marker before the
[Unreleased] link in CHANGELOG.md by deleting the solitary "-" line so the
Markdown does not render an empty bullet; locate the block containing the "-"
directly above the "[Unreleased]:
https://github.com/evstack/ev-node/compare/v1.0.0-beta.1...HEAD" link and remove
that dash.
---
Nitpick comments:
In `@block/internal/cache/manager.go`:
- Line 61: The exported interface method PendingEventsCount on the manager
interface is missing a one-line doc; add a concise method-level comment above
PendingEventsCount() int describing what it returns (e.g., "PendingEventsCount
returns the number of events currently queued for processing") so the exported
API is documented and follows the project's "Document exported types and
functions" guideline.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dc298eea-ae1a-40f9-9451-c9882c0705d3
📒 Files selected for processing (4)
CHANGELOG.mdblock/internal/cache/generic_cache.goblock/internal/cache/manager.goblock/internal/syncing/syncer.go
| s.inFlight.Add(1) | ||
| s.processHeightEvent(ctx, &heightEvent) | ||
| s.inFlight.Add(-1) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate PendingCount definition and all call sites.
rg -nP --type=go '\bPendingCount\s*\(' -C 4
# Find direct zero checks on PendingCount.
rg -nP --type=go 'PendingCount\s*\(\)\s*(==|<=|<)\s*0|0\s*(==|>=|>)\s*PendingCount\s*\(\)' -C 6Repository: evstack/ev-node
Length of output: 1937
🏁 Script executed:
# Get more context around the failover check at line 318
rg -nP --type=go 'pipelineDrained|PendingCount' block/internal/syncing/syncer.go node/failover.go -A 10 -B 10Repository: evstack/ev-node
Length of output: 2608
🏁 Script executed:
# Check if there's a polling loop or repeated checks for this condition
rg -nP --type=go 'catchupTimeout|catchup.*loop' node/failover.go -C 8Repository: evstack/ev-node
Length of output: 2668
🏁 Script executed:
# Search for any debouncing or stability check patterns in the codebase
rg -nP --type=go 'stable|debounce|window|consistent|verify.*zero' -C 3Repository: evstack/ev-node
Length of output: 29878
🏁 Script executed:
# Check what function contains lines 318-324 and what calls it
rg -nP --type=go 'func.*catchup|return true, nil' node/failover.go -B 20 -A 2 | head -80Repository: evstack/ev-node
Length of output: 838
🏁 Script executed:
# Search for any test coverage or repeated checks for pipelineDrained
rg -nP --type=go 'pipelineDrained|PendingCount.*==.*0' test/ -C 3Repository: evstack/ev-node
Length of output: 41
🏁 Script executed:
# Get the full function signature and scope containing the catchup logic
rg -nP --type=go 'func.*failoverState.*catchup' node/failover.go -A 100 | head -120Repository: evstack/ev-node
Length of output: 41
🏁 Script executed:
# Check if there are any retry mechanisms or stability checks after the return
rg -nP --type=go 'return true' node/failover.go -B 5 -A 5Repository: evstack/ev-node
Length of output: 336
🏁 Script executed:
# Get the full function definition
rg -nP --type=go 'func \(f \*failoverState\).*\(' node/failover.go -A 150 | head -200Repository: evstack/ev-node
Length of output: 7050
🏁 Script executed:
# Check the exact function name and context
sed -n '260,330p' node/failover.goRepository: evstack/ev-node
Length of output: 2049
Verify that callers don't treat a single PendingCount()==0 sample as terminal.
Line 318-319 in waitForCatchup() immediately returns upon first successful check without verifying stable-zero conditions. Since PendingCount() aggregates three independent sources (buffered channel, atomic in-flight counter, cache), a transient zero is possible during event handoff—particularly when inFlight cycles between processing states. The polling loop (~100ms interval) provides natural debouncing but no explicit stability requirement. Once the check succeeds, the syncer stops (line 268) and block production begins; the decision cannot be retried. Confirm that either repeated stable-zero sampling or additional guards prevent premature pipeline drain declaration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@block/internal/syncing/syncer.go` around lines 383 - 385, The check in
waitForCatchup() that treats a single PendingCount()==0 as terminal can be
fooled by transient handoffs between the buffered channel, inFlight
(s.inFlight), and cache; update waitForCatchup() to require stability before
returning by sampling PendingCount() multiple times (e.g., N consecutive zero
samples spaced by the current poll interval) or add an additional guard such as
verifying inFlight==0 and the buffered channel is empty after a short debounce
window; reference PendingCount(), waitForCatchup(), s.inFlight/Add and
processHeightEvent() to locate where to implement the repeated-sampling or
extra-check logic so the syncer does not stop prematurely.
| - **BREAKING** Make pending events cache and tx cache fully ephemeral. Those will be re-fetched on restart. DA Inclusion | ||
| cache persists until cleared up after DA inclusion has been processed. Persist accross restart using store | ||
| metadata. ([#3047](https://github.com/evstack/ev-node/pull/3047)) |
There was a problem hiding this comment.
Fix typo in historical changelog text.
Line 72 uses “accross”; it should be “across”.
Proposed fix
- cache persists until cleared up after DA inclusion has been processed. Persist accross restart using store
+ cache persists until cleared up after DA inclusion has been processed. Persist across restart using store🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@CHANGELOG.md` around lines 71 - 73, Fix the typo "accross" to "across" in the
changelog entry that begins "**BREAKING** Make pending events cache and tx cache
fully ephemeral..." (search for the exact word "accross" in that paragraph) so
the sentence reads "Persist across restart using store metadata.".
| - | ||
|
|
||
| [Unreleased]: https://github.com/evstack/ev-node/compare/v1.0.0-beta.1...HEAD |
There was a problem hiding this comment.
Remove stray empty bullet before the [Unreleased] link.
Line 316 leaves an orphan -, which renders as an empty bullet in markdown.
Proposed fix
--
-
[Unreleased]: https://github.com/evstack/ev-node/compare/v1.0.0-beta.1...HEAD📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - | |
| [Unreleased]: https://github.com/evstack/ev-node/compare/v1.0.0-beta.1...HEAD | |
| [Unreleased]: https://github.com/evstack/ev-node/compare/v1.0.0-beta.1...HEAD |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@CHANGELOG.md` around lines 316 - 318, Remove the stray orphan list marker
before the [Unreleased] link in CHANGELOG.md by deleting the solitary "-" line
so the Markdown does not render an empty bullet; locate the block containing the
"-" directly above the "[Unreleased]:
https://github.com/evstack/ev-node/compare/v1.0.0-beta.1...HEAD" link and remove
that dash.
Overview
Fixes a race on startup
runCatchup lost the local <= highestSeen bound and now scans upward until ErrHeightFromFuture, then marks head reached.
PendingCount() only returns len(heightInCh), so it ignores in-flight work (and pending-cache backlog), letting failover
stop too early.
Summary by CodeRabbit