fix: offload index sync store to blocking task#841
fix: offload index sync store to blocking task#841dknopik wants to merge 5 commits intosigp:unstablefrom
Conversation
anchor/eth/src/index_sync.rs
Outdated
| let space = MAX_BATCH_SIZE - batch.len(); | ||
| if space > 0 { | ||
| // Only do this if we have any space remaining and there are no store tasks that might wait | ||
| // to write missing indices. If the count is 1, only we hold the Arc (no other tasks). |
There was a problem hiding this comment.
Nit (non-blocking): Arc::strong_count uses Relaxed ordering and is inherently subject to TOCTOU — the count can change between the read and the branch. In this case the consequences are benign (an extra or skipped DB sweep), so it's not a correctness issue.
Still, an AtomicUsize would express the intent more clearly and avoid future confusion:
let pending_stores = Arc::new(AtomicUsize::new(0));
// ...
if space > 0 && pending_stores.load(Ordering::Acquire) == 0 {Up to you whether it's worth changing — the current code works fine in practice.
There was a problem hiding this comment.
I think the ordering makes only little difference here if I understand correctly.
I understand the argument to use a AtomicUsize for clarity though. What do you prefer?
There was a problem hiding this comment.
I believe AtomicUsize would be better
|
|
||
| // `set_validator_indices` may block as it start a database transaction and updates the | ||
| // in memory database. We do not want to do that on the async runtime, so we | ||
| // spawn a blocking task. |
There was a problem hiding this comment.
Nit: The comment explains what but not why — a reader unfamiliar with the PR context wouldn't know why blocking matters here. Consider something like:
// `set_validator_indices` performs synchronous SQLite I/O and takes a write lock
// on in-memory state, which would block the Tokio worker thread and starve other
// async tasks (e.g., beacon node health checks). Offload to the blocking thread pool.Also minor typo: "it start" → "it starts".
diegomrsantos
left a comment
There was a problem hiding this comment.
Thanks for handling this, great catch. Mostly good, only two small nits.
|
Thanks for the review, I addressed it. I do not think the Ordering matters at all:
from https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html We do not really touch any other shared memory beyond the |
anchor/eth/src/index_sync.rs
Outdated
| if space > 0 { | ||
| // Only do this if we have any space remaining and there are no store tasks that might wait | ||
| // to write missing indices. If the count is 1, only we hold the Arc (no other tasks). | ||
| if space > 0 && waiting_store_tasks.load(Ordering::Relaxed) == 1 { |
There was a problem hiding this comment.
waiting_store_tasks starts at 0, increments before spawning each blocking write, and decrements when it finishes.
So should this condition check == 0 (not == 1) if we only want DB fallback when no store writes are currently in flight?
There was a problem hiding this comment.
Ahh, damn, good catch, forgot to change this when editing earlier
|
Thanks for the follow-up — this is a good direction, and offloading If you’re open to a small readability pass, I think these changes would help:
The core idea is solid; this will make the intent much clearer and safer for future edits. |
|
Thanks! Addressed everything except the I skipped the predicate because the check is only used in one spot and reads clearly inline. Let me know if you'd still prefer it extracted. |
|
I think this PR is a good tactical fix for the immediate issue, but I also think it is important to frame it as a workaround, not the real fix. What this PR fixes is clear: the index syncer is async, but it was still doing blocking SQLite work on the runtime thread. Moving that store step into However, I think this is the same underlying design problem that came up in #735, just showing up in a different way. In #735, the problem shows up as a correctness issue:
Here, the same problem shows up as an execution-model issue:
In both cases, the root issue is that there is no single store boundary that owns:
So I view this PR as a reasonable stopgap, but not the architectural fix. The real fix, as discussed in #735, is to move toward a model where:
That would solve this class of problem at the right layer instead of requiring local mitigations like this one. So from my side: this looks like a sensible short-term fix, but I would not want us to treat it as the end-state design. |
|
Sure. :) Everything we work on has potential to be improved. Nothing I ever propose is implicitly supposed to be final. We improve the code incrementally.
I agree. Is there anything else you want to have addressed for this workaround PR? |
|
Thanks for working on this. After looking at the current shape of the workaround, I think we should close this PR and solve the problem properly instead. The easiest way to see the issue is to compare the control flow before and after this PR. Before this PR,
That was bad because the SQLite write could block the async runtime thread. But it also had one useful property: natural backpressure. The syncer could not get ahead of the database. After this PR, the flow becomes:
That fixes the local runtime-blocking symptom, but it removes the old backpressure. The problem is that the database is still single-writer in practice. So during historical sync, we can now do something like this:
But only one writer can actually run against SQLite. The rest just sit and wait. If we produce batches faster than SQLite can drain them, the queue grows. At that point we have traded:
That is not just a performance detail. It changes the failure mode. Late tasks can now wait a long time for the connection and potentially fail under load. There is a second issue as well. The new So my concern is not that the PR identified the wrong symptom. The symptom is real: blocking SQLite work should not run on the async runtime. My concern is that this workaround is now getting complex enough that it is changing write ordering, backpressure, and retry/progress behavior locally inside For that reason, I think we should close this PR rather than keep refining the workaround. I'm already working on the proper fix: a dedicated store owner / centralized write boundary that preserves serialization and backpressure explicitly, instead of trying to recover them with local heuristics inside That direction addresses the runtime-blocking issue here and also moves us toward the broader persistence/cache-coherence model we discussed in #735. Given that, I do not think it is worth landing this workaround in its current form. |
|
We can also immediately await the offloaded task if that is what you prefer. That seems to be what you are doing in #885. We should regain backpressure that way. Let me know what you think. The clear advantage of merging this is that we have this fixed for now. I feel larger refactors might be contentious and I want to avoid to be blocked on that as the Boole release approaches. This also aligns with our desire to have smaller PRs. The effort to merge this "workaround" is miniscule, now that it exists. We are on track to spend a lot of time on discussing whether to merge something that we both consider to be at least an improvement over the current situation. @shane-moore what is your opinion? |
when reading the beginning of the pr description about tokio runtime being blocked, my first thought was and I read through the threads, interesting points about losing backpressure, and yah, just doing an then, I don't see why not merge this in the spirit of continuous improvement, which y'all have already discussed. and then iterating/rearchitecting in another pr which is already happening. A phrase I like is, "progress over perfection" |
|
I agree with Shane’s point that, for the immediate problem here, dropping the fetch/write overlap and just awaiting the blocking DB work is probably fine. I think that concern about "losing the concurrency of fetching BN batches while waiting on DB writes" is also a good example of the broader issue, though: that overlap may improve throughput, but so far I have not seen evidence that the benefit is large enough to justify the complexity it introduces. The work on
So the simpler design is somewhat slower on this setup, by about More importantly, that complexity has not only contributed to the cache/publication issues discussed in That is the broader concern I’ve been raising for a long time: we keep carrying complexity in the name of throughput/concurrency/caching without first establishing that it is actually needed. The result is usually weaker boundaries, less obvious ownership, and failure semantics that are much harder to reason about. Given that, I don’t think Assuming |
|
Really cool and thorough how you ran the #885 against hoodi and were able to make legit comparisons for historical sync. Great finds with the current approach resulting in orphaned rows in the db. It feels pretty risk free to merge this after adding the |
Issue Addressed
During longer historic sync (e.g. initial sync) the following error would often occur:
The underlying issue is that the tokio runtime was blocked - preventing us from handling the http request/response and causing the timeout. The actual culprit is the index syncer: It is async, but may block in it's last step: storing the indices into the database. During historic sync, this block might take long enough to cause noticeable issues like the issue above.
Proposed Changes
Use
spawn_blockingto move this operation into it's own thread. This conveniently also allows us to continue fetching indices while the store operation waits. Additionally, we use anArcto track waiting store operations so that the mechanism fetching missing indices does not fetch the same indices over and over again while the store operation is waiting.