feat: lazy chain config init via ChainConfigWatch with epoch-boundary…#490
feat: lazy chain config init via ChainConfigWatch with epoch-boundary…#490ginnun wants to merge 1 commit intofeat/tx-evaluation-rebasedfrom
Conversation
Deploying blockfrost-platform with
|
| Latest commit: |
2ea2983
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://1e118ba4.blockfrost-platform.pages.dev |
| Branch Preview URL: | https://feat-chain-config-from-node.blockfrost-platform.pages.dev |
| use crate::model::api::Value; | ||
| use crate::model::api::ValueV6; | ||
|
|
||
| // TODO: This evaluator fetches genesis_config_and_pp() from the node on every |
There was a problem hiding this comment.
Replying #378 (comment) from @michalrus:
Let's continue here regarding the comment.
The problem you referenced is left open as a TODO in this sub-PR.
Native evaluator should use the new caching mechanism.
There was a problem hiding this comment.
Hmmm, now I'm not sure if it wouldn't be easier to review, if it was just a single PR
There was a problem hiding this comment.
Hmmm, now I'm not sure if it wouldn't be easier to review, if it was just a single PR
@ginnun, @vladimirvolek, WDYT?
There was a problem hiding this comment.
Pull request overview
This PR introduces lazy, background initialization and refresh of chain configuration (genesis + protocol parameters) so transaction evaluation stays correct while the node syncs and across epoch-boundary parameter updates.
Changes:
- Add
ChainConfigWatchto wait for node readiness and refresh protocol parameters at epoch boundaries, publishing updates via awatchchannel. - Make
ExternalEvaluatorpull-based and lazily spawn/re-spawntestgen-hsbased on cached config availability/changes; evaluation returns 503 until config is ready. - Update platform server bootstrap and tests to accommodate non-blocking startup and the new watcher; add
BlockfrostError::service_unavailable(503).
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/tx_evaluator/src/native.rs | Adds note about future migration to cached chain config. |
| crates/tx_evaluator/src/external.rs | Refactors evaluator to use ChainConfigWatch and lazy (re)spawning of testgen-hs. |
| crates/tx_evaluator/Cargo.toml | Adds tokio + tracing workspace deps for new async/logging usage. |
| crates/platform/src/server.rs | Switches from eager init_caches to ChainConfigWatch and new evaluator construction; returns watcher to callers. |
| crates/platform/src/main.rs | Adapts to new build() return tuple shape. |
| crates/platform/tests/common/mod.rs | Exposes watcher from app builder and adds initialize_app() that waits for readiness when needed. |
| crates/platform/tests/{root,metrics,submit,icebreakers,data_node}_test.rs | Updates destructuring to match new build tuple. |
| crates/node/src/lib.rs | Exposes new chain_config_watch module. |
| crates/node/src/ledger.rs | Removes now-redundant one-off protocol/genesis query helpers. |
| crates/node/src/chain_config_watch.rs | Implements the watcher (sync gating, epoch-boundary scheduling, refresh loop) plus unit tests for scheduling. |
| crates/node/src/chain_config.rs | Minor refactor of error mapping for genesis/pp init. |
| crates/common/src/errors.rs | Adds service_unavailable() and maps 503 to StatusCode::SERVICE_UNAVAILABLE. |
| crates/common/src/chain_config.rs | Improves docs and introduces ChainConfigCache::CONWAY_ERA constant. |
| Cargo.lock | Records new dependencies pulled into tx_evaluator. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if info.era != CONWAY_ERA { | ||
| SyncStatus::NotReady(format!( | ||
| "Node is in era {} (need Conway era {CONWAY_ERA})", | ||
| info.era | ||
| )) | ||
| } else if info.sync_progress < SYNC_THRESHOLD { | ||
| SyncStatus::NotReady(format!( |
There was a problem hiding this comment.
ChainConfigCache::CONWAY_ERA is documented as the minimum supported era, but the sync gate treats any era other than exactly Conway as "not ready". This will block forever once the node moves to a future era (>6) even if the platform could still operate. Consider using a info.era < CONWAY_ERA check (or otherwise defining the intended policy explicitly).
|
|
||
| // Slots into the current epoch. | ||
| let slot_in_epoch = (current_slot - slot_config.zero_slot) % slot_config.epoch_length; | ||
| let slots_until_next = slot_config.epoch_length - slot_in_epoch; |
There was a problem hiding this comment.
duration_until_next_epoch waits a full epoch when slot_in_epoch == 0 (exactly on an epoch boundary), because slots_until_next becomes epoch_length. That can delay the epoch-boundary refresh by an entire epoch in the boundary case. Consider computing slots_until_next so that a boundary yields 0 (or 1) and then adding the buffer, e.g. (epoch_length - slot_in_epoch) % epoch_length.
| let slots_until_next = slot_config.epoch_length - slot_in_epoch; | |
| // Compute slots until next epoch boundary; if we're exactly on a boundary, | |
| // this yields 0 instead of a full epoch. | |
| let slots_until_next = | |
| (slot_config.epoch_length - slot_in_epoch) % slot_config.epoch_length; |
| if self.rx.changed().await.is_err() { | ||
| break; |
There was a problem hiding this comment.
wait_ready returns silently if the watch sender is dropped (e.g., the background task panics/exits) even though the config is still None, which contradicts the method’s intent and can make tests proceed with an unready app. Consider returning a Result/error on channel closure (or at least logging) so callers can fail fast instead of continuing without config.
| if self.rx.changed().await.is_err() { | |
| break; | |
| match self.rx.changed().await { | |
| Ok(_) => { | |
| // value changed; loop condition will re-check for readiness | |
| } | |
| Err(err) => { | |
| // Sender was dropped before config became available; this indicates | |
| // that the background task has terminated and readiness will never be reached. | |
| tracing::error!( | |
| "ChainConfigWatch: watch channel closed before configuration became available: {}", | |
| err | |
| ); | |
| break; | |
| } |
| } | ||
| state.testgen = Some(testgen.clone()); | ||
| state.last_config = Some(current_config); | ||
| tracing::info!("ExternalEvaluator: testgen-hs initialized successfully"); |
There was a problem hiding this comment.
When reinitializing, the previous Testgen instance is dropped and replaced. In bf_testgen::Testgen, dropping all senders closes the channel, and the worker thread treats "request channel closed" as an error and keeps restarting the child in an infinite loop (spawning a new testgen-hs repeatedly). With epoch-boundary respawns this can leak processes/threads over time. Consider adding an explicit shutdown path for Testgen (and exiting the worker thread on channel close), or otherwise ensuring the old process is terminated cleanly before dropping the handle.
| } | |
| state.testgen = Some(testgen.clone()); | |
| state.last_config = Some(current_config); | |
| tracing::info!("ExternalEvaluator: testgen-hs initialized successfully"); | |
| } | |
| // Config changed: take the old Testgen so we can shut it down explicitly. | |
| let old_testgen = if let Some(ref last) = state.last_config { | |
| if !Arc::ptr_eq(last, ¤t_config) { | |
| state.testgen.take() | |
| } else { | |
| None | |
| } | |
| } else { | |
| None | |
| }; | |
| state.testgen = Some(testgen.clone()); | |
| state.last_config = Some(current_config); | |
| tracing::info!("ExternalEvaluator: testgen-hs initialized successfully"); | |
| // Drop the lock before awaiting shutdown to avoid holding the mutex across .await. | |
| drop(state); | |
| if let Some(old) = old_testgen { | |
| if let Err(err) = old.shutdown().await { | |
| tracing::warn!( | |
| "ExternalEvaluator: failed to cleanly shut down previous testgen-hs instance: {err}" | |
| ); | |
| } | |
| } |
|
|
||
| /// This structure is used to share server-wide cachables | ||
| /// Cached chain configuration queried from the Cardano node at startup and | ||
| /// refreshed at epoch boundaries by [`ChainConfigWatch`]. |
There was a problem hiding this comment.
This doc comment uses an intra-doc link to ChainConfigWatch, but that type lives in bf_node and isn’t in scope in the bf_common crate. This can produce broken intra-doc-link warnings during cargo doc/docs.rs. Consider removing the link formatting or referencing it as plain text (or using an external URL) to avoid unresolved links.
| /// refreshed at epoch boundaries by [`ChainConfigWatch`]. | |
| /// refreshed at epoch boundaries by `ChainConfigWatch`. |
|
@michalrus Let's wait a bit to review this PR. I want to understand if it would be cheaper to implement #482 (replacing all this binary callings) , when compared to me doing all the AI feedbacks and you reviewing all the work. |
Oh, I don’t know! I'm assuming it would be quite a feat to get it working, I never tried that. I'm not even sure if this is the correct idea, after all we were supposed to write it in Rust 😅 And now we'd link the Haskell binary with ours. |
Resolves #489
Summary
init_cachesat startup withChainConfigWatch— a background watcher that waits for the node to reach Conway era with ≥99.9% sync, then refreshes protocol parameters at each epoch boundaryExternalEvaluatoris now pull-based: reads config on demand from cache, spawns testgen-hs lazily on first evaluate request, and re-spawns when config changes (detected viaArc::ptr_eq)BlockfrostError::service_unavailable(503)Motivation
init_cachesran once at startup. If the node was still syncing (wrong era or incomplete state), the evaluator got wrong genesis config and protocol parameters for its entire lifetime. Protocol parameter changes at epoch boundaries (via governance) were also missed — requiring a full restart.