Skip to content

Conversation

@ziggie1984
Copy link
Collaborator

@ziggie1984 ziggie1984 commented Oct 29, 2025

We need to be careful when it comes to goroutines which run in a loop, the channelGraphSyncer ended up in an endless loop for me a lot of these log lines:

2025-10-29 17:07:27.349 [ERR] DISC: Unable to gen chan range query: unable to fetch highest chan ID: context canceled
2025-10-29 17:07:27.349 [DBG] DISC: GossipSyncer(037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3): state=syncingChans, type=PinnedSync
2025-10-29 17:07:27.349 [ERR] DISC: Unable to gen chan range query: unable to fetch highest chan ID: context canceled
2025-10-29 17:07:27.349 [DBG] DISC: GossipSyncer(037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3): state=syncingChans, type=PinnedSync
2025-10-29 17:07:27.349 [ERR] DISC: Unable to gen chan range query: unable to fetch highest chan ID: context canceled
2025-10-29 17:07:27.349 [DBG] DISC: GossipSyncer(037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3): state=syncingChans, type=PinnedSync
2025-10-29 17:07:27.349 [ERR] DISC: Unable to gen chan range query: unable to fetch highest chan ID: context canceled
2025-10-29 17:07:27.349 [DBG] DISC: GossipSyncer(037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3): state=syncingChans, type=PinnedSync
2025-10-29 17:07:27.349 [ERR] DISC: Unable to gen chan range query: unable to fetch highest chan ID: context canceled
2025-10-29 17:07:27.349 [DBG] DISC: GossipSyncer(037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3): state=syncingChans, type=PinnedSync
2025-10-29 17:07:27.349 [ERR] DISC: Unable to gen chan range query: unable to fetch highest chan ID: context canceled
2025-10-29 17:07:27.349 [DBG] DISC: GossipSyncer(037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3): state=syncingChans, type=PinnedSync
2025-10-29 17:07:27.349 [ERR] DISC: Unable to gen chan range query: unable to fetch highest chan ID: context canceled
2025-10-29 17:07:27.349 [DBG] DISC: GossipSyncer(037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3): state=syncingChans, type=PinnedSync
2025-10-29 17:07:27.349 [ERR] DISC: Unable to gen chan range query: unable to fetch highest chan ID: context canceled
2025-10-29 17:07:27.349 [DBG] DISC: GossipSyncer(037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3): state=syncingChans, type=PinnedSync

@ziggie1984 ziggie1984 force-pushed the fix-ctx-endless-loop branch from 69190dd to e302fd0 Compare October 29, 2025 16:14
@ziggie1984 ziggie1984 self-assigned this Oct 29, 2025
@ziggie1984 ziggie1984 added this to the v0.20.0 milestone Oct 29, 2025
We make sure we check for the context cancel in the ApplyGossipFilter
method.
@saubyk saubyk moved this to In progress in lnd v0.20 Oct 29, 2025
Roasbeef added a commit to Roasbeef/lnd that referenced this pull request Oct 29, 2025
This commit fixes a critical bug where the channelGraphSyncer goroutine
would enter an endless loop when context cancellation or peer disconnect
errors occurred during the syncingChans or queryNewChannels states.

The root cause was that state handler functions (handleSyncingChans and
synchronizeChanIDs) did not return errors to the main goroutine loop.
When these functions encountered fatal errors like context cancellation,
they would log the error and return early without changing the syncer's
state. This caused the main loop to immediately re-enter the same state
handler, encounter the same error, and loop indefinitely while spamming
error logs.

The fix makes error handling explicit by having state handlers return
errors. The main channelGraphSyncer loop now checks these errors and exits
cleanly when fatal errors occur. We return any error (not just context
cancellation) because fatal errors can manifest in multiple forms: context.Canceled,
ErrGossipSyncerExiting from the rate limiter, lnpeer.ErrPeerExiting from
Brontide, or network errors like connection closed. This approach matches
the error handling pattern already used in other goroutines like replyHandler.

Changes:
- handleSyncingChans now returns error instead of void
- synchronizeChanIDs now returns (bool, error) instead of just bool
- channelGraphSyncer main loop checks errors and exits with debug logging
- Function signatures updated to document error return semantics

This fix addresses the issue more robustly than PR lightningnetwork#10329's defensive
loop-level context check, as it fixes the root cause in the error
handling architecture rather than treating the symptom.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ziggie1984
Copy link
Collaborator Author

closing in favour of #10330

@ziggie1984 ziggie1984 closed this Nov 1, 2025
@github-project-automation github-project-automation bot moved this from In progress to Done in lnd v0.20 Nov 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant