Skip to content

Conversation

@grahamking
Copy link
Contributor

@grahamking grahamking commented Nov 20, 2025

Replaced by store(), which returns an interface. That allows us to use other stores instead of etcd.

Planner's VirtualConnector was the last user of it. Now that creates it's own etcd client.

Summary by CodeRabbit

Release Notes

  • Refactor
    • Updated connector class initialization with result-based error handling.
    • Revised constructor signatures for improved consistency.
    • Simplified distributed runtime interface by removing a direct client accessor.

✏️ Tip: You can customize this high-level summary in your review settings.

@grahamking grahamking requested a review from a team as a code owner November 20, 2025 00:29
@github-actions github-actions bot added the chore label Nov 20, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 20, 2025

Walkthrough

The pull request refactors etcd client initialization from centralized runtime state access to localized async construction within constructors. Constructor signatures for virtual connector classes are updated to return PyResult<Self>, and the public etcd_client() accessor is removed from DistributedRuntime, forcing consumers toward the store() interface.

Changes

Cohort / File(s) Summary
Python binding constructor refactoring
lib/bindings/python/rust/planner.rs
Updated VirtualConnectorCoordinator and VirtualConnectorClient constructors to return PyResult<Self> instead of Self. Both now accept drt parameter, perform async etcd client construction via block_on(), and reduced parameter counts. Imports adjusted to use etcd module and CancellationToken from tokio_util.
Runtime interface simplification
lib/runtime/src/distributed.rs
Removed public etcd_client() accessor method from DistributedRuntime. Doc comment for store() updated to reflect external/out-of-process semantics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Async construction correctness: Verify block_on() usage in constructors is safe and properly handles runtime initialization
  • PyResult error propagation: Confirm error handling is consistent across both constructor changes and caller expectations
  • API removal impact: Validate that removal of etcd_client() doesn't introduce broken callers or unexpected behavior changes
  • Parameter reduction: Ensure the reduced parameter set in VirtualConnectorCoordinator::new doesn't lose necessary configuration

Poem

🐰 Async connections spring anew,
With PyResult wrapping through and through,
Old accessors hop away so quick,
Store interface does the trick,
Refactored code, clean and bright,
Rabbits code through the night!

Pre-merge checks

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The pull request description is incomplete. It lacks structured sections (Overview, Details, Where to start, Related Issues) as specified in the template. Add Overview, Details, and Related Issues sections following the provided template format. Specify the GitHub issue number if this PR closes/fixes an issue.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title accurately describes the main change: removing the DistributedRuntime::etcd_client method and addressing the abstraction concern.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
lib/bindings/python/rust/planner.rs (1)

56-66: Consider consolidating duplicate etcd client creation logic.

Both VirtualConnectorCoordinator::new and VirtualConnectorClient::new contain nearly identical code for creating the etcd client (lines 56-66 and 378-386). This could be extracted into a helper function to reduce duplication.

Example refactor:

fn create_etcd_client(drt: &super::DistributedRuntime) -> PyResult<Client> {
    let etcd_config = etcd::ClientOptions::default();
    drt.inner
        .runtime()
        .secondary()
        .block_on(
            async move { etcd::Client::new(etcd_config, drt.inner.runtime().clone()).await }
        )
        .map_err(to_pyerr)
}

Then use let etcd_client = create_etcd_client(&drt)?; in both constructors.

Also applies to: 378-386

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3f764e and 5d2b70f.

📒 Files selected for processing (2)
  • lib/bindings/python/rust/planner.rs (3 hunks)
  • lib/runtime/src/distributed.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-21T01:40:52.456Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 3155
File: components/backends/vllm/src/dynamo/vllm/main.py:228-233
Timestamp: 2025-09-21T01:40:52.456Z
Learning: In the dynamo codebase, error handling for distributed runtime client initialization (like runtime.namespace().component().endpoint().client()) is handled at the Rust level in the distributed runtime bindings, so Python-level try/catch blocks are not needed and would be redundant.

Applied to files:

  • lib/bindings/python/rust/planner.rs
🧬 Code graph analysis (1)
lib/bindings/python/rust/planner.rs (3)
lib/runtime/src/distributed.rs (4)
  • default (506-508)
  • new (99-287)
  • None (581-581)
  • runtime (294-296)
lib/runtime/src/transports/etcd.rs (4)
  • default (567-596)
  • etcd_client (116-118)
  • new (69-112)
  • new (619-662)
lib/bindings/python/rust/lib.rs (3)
  • new (457-508)
  • new (935-939)
  • to_pyerr (206-211)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: operator (arm64)
  • GitHub Check: operator (amd64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: clippy (lib/runtime/examples)
  • GitHub Check: clippy (lib/bindings/python)
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: clippy (.)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: tests (.)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
lib/runtime/src/distributed.rs (1)

426-430: LGTM! Doc comment appropriately generalized.

The updated documentation correctly describes store() as an external interface without tying it to etcd specifically, which aligns with the architectural goal of this PR.

lib/bindings/python/rust/planner.rs (4)

17-18: LGTM! Import adjustments are appropriate.

The changes to import etcd as a module and add CancellationToken support the new constructor logic.


47-81: Constructor refactoring is functionally correct.

The changes successfully move etcd client initialization into the constructor:

  • Returning PyResult<Self> enables proper error propagation
  • Using block_on is necessary since Python constructors cannot be async
  • The async client construction follows the correct pattern

However, this etcd client creation logic is duplicated in VirtualConnectorClient::new (lines 378-386). See separate comment about consolidation.


375-393: Constructor refactoring is functionally correct.

The changes successfully move etcd client initialization into the constructor with proper error handling. The pattern of extracting runtime and cancellation_token before the async block is slightly cleaner than the Coordinator version.

However, this etcd client creation logic duplicates the code in VirtualConnectorCoordinator::new (lines 56-66). See separate comment about consolidation.


56-66: Based on the script results and web search findings, I can now provide the final rewritten review comment. The search results show that:

  1. etcd client uses per_host_pool_size for connection pooling by host, which means connection reuse happens at the HTTP/gRPC layer.
  2. The codebase search found very limited instantiation: only one in production code (VirtualConnectorCoordinator in virtual_connector.py:41) and one in tests.
  3. The concern about resource usage would be valid IF multiple instances were created frequently, but the evidence suggests limited instantiation.

Revisit resource usage concern about per-instance etcd clients.

The search results show only one VirtualConnectorCoordinator instantiation in production code (during initialization) and one VirtualConnectorClient in tests. etcd client's connection pooling by host (per_host_pool_size) means repeated connections to the same etcd endpoint reuse pooled connections, reducing per-instance overhead.

If these connector instances are indeed created only once or rarely in production, the resource implications from each having its own etcd client are minimal. However, if instantiation patterns change or multiple instances are created elsewhere, verify that etcd client connections are being properly closed to prevent resource leaks.

Replace by `store()`.

Planner's VirtualConnector was the last user of it. Now that creates
it's own etcd client.

Signed-off-by: Graham King <grahamk@nvidia.com>
@grahamking grahamking force-pushed the gk-etcd-client-next-1 branch from 5d2b70f to 56885ac Compare November 20, 2025 13:41
@grahamking grahamking changed the title chore: Remove DistributedRuntime::etcd_client() method chore: Remove DistributedRuntime::etcd_client Nov 20, 2025
Copy link
Contributor

@mohammedabdulwahhab mohammedabdulwahhab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Copy link
Contributor

@nnshah1 nnshah1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: wasn't able to immediately see - but are there users of the top level store?

Copy link
Contributor

@nnshah1 nnshah1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if store can be removed from top level -

@grahamking grahamking merged commit 8a14f9e into main Nov 20, 2025
33 of 36 checks passed
@grahamking grahamking deleted the gk-etcd-client-next-1 branch November 20, 2025 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants