Skip to content

feat: lazy chain config init via ChainConfigWatch with epoch-boundary…#490

Open
ginnun wants to merge 1 commit intofeat/tx-evaluation-rebasedfrom
feat/chain-config-from-node
Open

feat: lazy chain config init via ChainConfigWatch with epoch-boundary…#490
ginnun wants to merge 1 commit intofeat/tx-evaluation-rebasedfrom
feat/chain-config-from-node

Conversation

@ginnun
Copy link
Copy Markdown
Collaborator

@ginnun ginnun commented Mar 23, 2026

Resolves #489

Summary

  • Replace one-shot init_caches at startup with ChainConfigWatch — a background watcher that waits for the node to reach Conway era with ≥99.9% sync, then refreshes protocol parameters at each epoch boundary
  • ExternalEvaluator is now pull-based: reads config on demand from cache, spawns testgen-hs lazily on first evaluate request, and re-spawns when config changes (detected via Arc::ptr_eq)
  • Server starts immediately, evaluate endpoints return 503 until the node is ready, all other endpoints work from the start
  • Add BlockfrostError::service_unavailable (503)

Motivation

init_caches ran once at startup. If the node was still syncing (wrong era or incomplete state), the evaluator got wrong genesis config and protocol parameters for its entire lifetime. Protocol parameter changes at epoch boundaries (via governance) were also missed — requiring a full restart.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying blockfrost-platform with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2ea2983
Status: ✅  Deploy successful!
Preview URL: https://1e118ba4.blockfrost-platform.pages.dev
Branch Preview URL: https://feat-chain-config-from-node.blockfrost-platform.pages.dev

View logs

@ginnun ginnun added the enhancement New feature or request label Mar 23, 2026
@ginnun ginnun self-assigned this Mar 23, 2026
@michalrus michalrus mentioned this pull request Mar 23, 2026
use crate::model::api::Value;
use crate::model::api::ValueV6;

// TODO: This evaluator fetches genesis_config_and_pp() from the node on every
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replying #378 (comment) from @michalrus:
Let's continue here regarding the comment.
The problem you referenced is left open as a TODO in this sub-PR.
Native evaluator should use the new caching mechanism.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, now I'm not sure if it wouldn't be easier to review, if it was just a single PR

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, now I'm not sure if it wouldn't be easier to review, if it was just a single PR

@ginnun, @vladimirvolek, WDYT?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces lazy, background initialization and refresh of chain configuration (genesis + protocol parameters) so transaction evaluation stays correct while the node syncs and across epoch-boundary parameter updates.

Changes:

  • Add ChainConfigWatch to wait for node readiness and refresh protocol parameters at epoch boundaries, publishing updates via a watch channel.
  • Make ExternalEvaluator pull-based and lazily spawn/re-spawn testgen-hs based on cached config availability/changes; evaluation returns 503 until config is ready.
  • Update platform server bootstrap and tests to accommodate non-blocking startup and the new watcher; add BlockfrostError::service_unavailable (503).

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
crates/tx_evaluator/src/native.rs Adds note about future migration to cached chain config.
crates/tx_evaluator/src/external.rs Refactors evaluator to use ChainConfigWatch and lazy (re)spawning of testgen-hs.
crates/tx_evaluator/Cargo.toml Adds tokio + tracing workspace deps for new async/logging usage.
crates/platform/src/server.rs Switches from eager init_caches to ChainConfigWatch and new evaluator construction; returns watcher to callers.
crates/platform/src/main.rs Adapts to new build() return tuple shape.
crates/platform/tests/common/mod.rs Exposes watcher from app builder and adds initialize_app() that waits for readiness when needed.
crates/platform/tests/{root,metrics,submit,icebreakers,data_node}_test.rs Updates destructuring to match new build tuple.
crates/node/src/lib.rs Exposes new chain_config_watch module.
crates/node/src/ledger.rs Removes now-redundant one-off protocol/genesis query helpers.
crates/node/src/chain_config_watch.rs Implements the watcher (sync gating, epoch-boundary scheduling, refresh loop) plus unit tests for scheduling.
crates/node/src/chain_config.rs Minor refactor of error mapping for genesis/pp init.
crates/common/src/errors.rs Adds service_unavailable() and maps 503 to StatusCode::SERVICE_UNAVAILABLE.
crates/common/src/chain_config.rs Improves docs and introduces ChainConfigCache::CONWAY_ERA constant.
Cargo.lock Records new dependencies pulled into tx_evaluator.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +161 to +167
if info.era != CONWAY_ERA {
SyncStatus::NotReady(format!(
"Node is in era {} (need Conway era {CONWAY_ERA})",
info.era
))
} else if info.sync_progress < SYNC_THRESHOLD {
SyncStatus::NotReady(format!(
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChainConfigCache::CONWAY_ERA is documented as the minimum supported era, but the sync gate treats any era other than exactly Conway as "not ready". This will block forever once the node moves to a future era (>6) even if the platform could still operate. Consider using a info.era < CONWAY_ERA check (or otherwise defining the intended policy explicitly).

Copilot uses AI. Check for mistakes.

// Slots into the current epoch.
let slot_in_epoch = (current_slot - slot_config.zero_slot) % slot_config.epoch_length;
let slots_until_next = slot_config.epoch_length - slot_in_epoch;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duration_until_next_epoch waits a full epoch when slot_in_epoch == 0 (exactly on an epoch boundary), because slots_until_next becomes epoch_length. That can delay the epoch-boundary refresh by an entire epoch in the boundary case. Consider computing slots_until_next so that a boundary yields 0 (or 1) and then adding the buffer, e.g. (epoch_length - slot_in_epoch) % epoch_length.

Suggested change
let slots_until_next = slot_config.epoch_length - slot_in_epoch;
// Compute slots until next epoch boundary; if we're exactly on a boundary,
// this yields 0 instead of a full epoch.
let slots_until_next =
(slot_config.epoch_length - slot_in_epoch) % slot_config.epoch_length;

Copilot uses AI. Check for mistakes.
Comment on lines +57 to +58
if self.rx.changed().await.is_err() {
break;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait_ready returns silently if the watch sender is dropped (e.g., the background task panics/exits) even though the config is still None, which contradicts the method’s intent and can make tests proceed with an unready app. Consider returning a Result/error on channel closure (or at least logging) so callers can fail fast instead of continuing without config.

Suggested change
if self.rx.changed().await.is_err() {
break;
match self.rx.changed().await {
Ok(_) => {
// value changed; loop condition will re-check for readiness
}
Err(err) => {
// Sender was dropped before config became available; this indicates
// that the background task has terminated and readiness will never be reached.
tracing::error!(
"ChainConfigWatch: watch channel closed before configuration became available: {}",
err
);
break;
}

Copilot uses AI. Check for mistakes.
Comment on lines 130 to +133
}
state.testgen = Some(testgen.clone());
state.last_config = Some(current_config);
tracing::info!("ExternalEvaluator: testgen-hs initialized successfully");
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reinitializing, the previous Testgen instance is dropped and replaced. In bf_testgen::Testgen, dropping all senders closes the channel, and the worker thread treats "request channel closed" as an error and keeps restarting the child in an infinite loop (spawning a new testgen-hs repeatedly). With epoch-boundary respawns this can leak processes/threads over time. Consider adding an explicit shutdown path for Testgen (and exiting the worker thread on channel close), or otherwise ensuring the old process is terminated cleanly before dropping the handle.

Suggested change
}
state.testgen = Some(testgen.clone());
state.last_config = Some(current_config);
tracing::info!("ExternalEvaluator: testgen-hs initialized successfully");
}
// Config changed: take the old Testgen so we can shut it down explicitly.
let old_testgen = if let Some(ref last) = state.last_config {
if !Arc::ptr_eq(last, &current_config) {
state.testgen.take()
} else {
None
}
} else {
None
};
state.testgen = Some(testgen.clone());
state.last_config = Some(current_config);
tracing::info!("ExternalEvaluator: testgen-hs initialized successfully");
// Drop the lock before awaiting shutdown to avoid holding the mutex across .await.
drop(state);
if let Some(old) = old_testgen {
if let Err(err) = old.shutdown().await {
tracing::warn!(
"ExternalEvaluator: failed to cleanly shut down previous testgen-hs instance: {err}"
);
}
}

Copilot uses AI. Check for mistakes.

/// This structure is used to share server-wide cachables
/// Cached chain configuration queried from the Cardano node at startup and
/// refreshed at epoch boundaries by [`ChainConfigWatch`].
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc comment uses an intra-doc link to ChainConfigWatch, but that type lives in bf_node and isn’t in scope in the bf_common crate. This can produce broken intra-doc-link warnings during cargo doc/docs.rs. Consider removing the link formatting or referencing it as plain text (or using an external URL) to avoid unresolved links.

Suggested change
/// refreshed at epoch boundaries by [`ChainConfigWatch`].
/// refreshed at epoch boundaries by `ChainConfigWatch`.

Copilot uses AI. Check for mistakes.
@ginnun
Copy link
Copy Markdown
Collaborator Author

ginnun commented Mar 24, 2026

@michalrus Let's wait a bit to review this PR. I want to understand if it would be cheaper to implement #482 (replacing all this binary callings) , when compared to me doing all the AI feedbacks and you reviewing all the work.

@michalrus
Copy link
Copy Markdown
Member

I want to understand if it would be cheaper to implement #482

Oh, I don’t know! I'm assuming it would be quite a feat to get it working, I never tried that. I'm not even sure if this is the correct idea, after all we were supposed to write it in Rust 😅 And now we'd link the Haskell binary with ours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace eager chain config init with lazy config init and epoch-boundary refresh

3 participants