-
Notifications
You must be signed in to change notification settings - Fork 700
chore: Remove DistributedRuntime::etcd_client #4489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request refactors etcd client initialization from centralized runtime state access to localized async construction within constructors. Constructor signatures for virtual connector classes are updated to return Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/bindings/python/rust/planner.rs (1)
56-66: Consider consolidating duplicate etcd client creation logic.Both
VirtualConnectorCoordinator::newandVirtualConnectorClient::newcontain nearly identical code for creating the etcd client (lines 56-66 and 378-386). This could be extracted into a helper function to reduce duplication.Example refactor:
fn create_etcd_client(drt: &super::DistributedRuntime) -> PyResult<Client> { let etcd_config = etcd::ClientOptions::default(); drt.inner .runtime() .secondary() .block_on( async move { etcd::Client::new(etcd_config, drt.inner.runtime().clone()).await } ) .map_err(to_pyerr) }Then use
let etcd_client = create_etcd_client(&drt)?;in both constructors.Also applies to: 378-386
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/bindings/python/rust/planner.rs(3 hunks)lib/runtime/src/distributed.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-21T01:40:52.456Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 3155
File: components/backends/vllm/src/dynamo/vllm/main.py:228-233
Timestamp: 2025-09-21T01:40:52.456Z
Learning: In the dynamo codebase, error handling for distributed runtime client initialization (like runtime.namespace().component().endpoint().client()) is handled at the Rust level in the distributed runtime bindings, so Python-level try/catch blocks are not needed and would be redundant.
Applied to files:
lib/bindings/python/rust/planner.rs
🧬 Code graph analysis (1)
lib/bindings/python/rust/planner.rs (3)
lib/runtime/src/distributed.rs (4)
default(506-508)new(99-287)None(581-581)runtime(294-296)lib/runtime/src/transports/etcd.rs (4)
default(567-596)etcd_client(116-118)new(69-112)new(619-662)lib/bindings/python/rust/lib.rs (3)
new(457-508)new(935-939)to_pyerr(206-211)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: trtllm (arm64)
- GitHub Check: sglang (arm64)
- GitHub Check: operator (arm64)
- GitHub Check: operator (amd64)
- GitHub Check: vllm (arm64)
- GitHub Check: clippy (lib/runtime/examples)
- GitHub Check: clippy (lib/bindings/python)
- GitHub Check: tests (lib/runtime/examples)
- GitHub Check: clippy (launch/dynamo-run)
- GitHub Check: tests (lib/bindings/python)
- GitHub Check: clippy (.)
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: tests (.)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
lib/runtime/src/distributed.rs (1)
426-430: LGTM! Doc comment appropriately generalized.The updated documentation correctly describes
store()as an external interface without tying it to etcd specifically, which aligns with the architectural goal of this PR.lib/bindings/python/rust/planner.rs (4)
17-18: LGTM! Import adjustments are appropriate.The changes to import
etcdas a module and addCancellationTokensupport the new constructor logic.
47-81: Constructor refactoring is functionally correct.The changes successfully move etcd client initialization into the constructor:
- Returning
PyResult<Self>enables proper error propagation- Using
block_onis necessary since Python constructors cannot be async- The async client construction follows the correct pattern
However, this etcd client creation logic is duplicated in
VirtualConnectorClient::new(lines 378-386). See separate comment about consolidation.
375-393: Constructor refactoring is functionally correct.The changes successfully move etcd client initialization into the constructor with proper error handling. The pattern of extracting
runtimeandcancellation_tokenbefore the async block is slightly cleaner than the Coordinator version.However, this etcd client creation logic duplicates the code in
VirtualConnectorCoordinator::new(lines 56-66). See separate comment about consolidation.
56-66: Based on the script results and web search findings, I can now provide the final rewritten review comment. The search results show that:
- etcd client uses
per_host_pool_sizefor connection pooling by host, which means connection reuse happens at the HTTP/gRPC layer.- The codebase search found very limited instantiation: only one in production code (
VirtualConnectorCoordinatorinvirtual_connector.py:41) and one in tests.- The concern about resource usage would be valid IF multiple instances were created frequently, but the evidence suggests limited instantiation.
Revisit resource usage concern about per-instance etcd clients.
The search results show only one
VirtualConnectorCoordinatorinstantiation in production code (during initialization) and oneVirtualConnectorClientin tests. etcd client's connection pooling by host (per_host_pool_size) means repeated connections to the same etcd endpoint reuse pooled connections, reducing per-instance overhead.If these connector instances are indeed created only once or rarely in production, the resource implications from each having its own etcd client are minimal. However, if instantiation patterns change or multiple instances are created elsewhere, verify that etcd client connections are being properly closed to prevent resource leaks.
Replace by `store()`. Planner's VirtualConnector was the last user of it. Now that creates it's own etcd client. Signed-off-by: Graham King <grahamk@nvidia.com>
5d2b70f to
56885ac
Compare
mohammedabdulwahhab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
nnshah1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: wasn't able to immediately see - but are there users of the top level store?
nnshah1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if store can be removed from top level -
Replaced by
store(), which returns an interface. That allows us to use other stores instead ofetcd.Planner's
VirtualConnectorwas the last user of it. Now that creates it's own etcd client.Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.