Conversation
📝 WalkthroughWalkthroughKubernetes CNI networking support was added to the system. A new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cloudai/systems/kubernetes/kubernetes_system.py`:
- Around line 136-151: The get_network_attachment_definitions function assumes
every item has metadata.name and metadata.namespace which can raise KeyError on
malformed items; update the list construction to defensively access keys (e.g.,
use item.get("metadata", {}) and metadata.get("name")/get("namespace")) or wrap
per-item extraction in a try/except that skips and logs malformed entries,
returning only well-formed "namespace/name" strings to preserve the existing
graceful-degradation behavior.
In `@src/cloudai/workloads/nccl_test/kubernetes_json_gen_strategy.py`:
- Around line 217-222: The loop handling cni_networks can raise IndexError
because net.split("/",1)[1] assumes a "/" exists; update the loop in
kubernetes_json_gen_strategy.py where cni_networks is processed (the block
referencing cni_networks, nic_name, and resources["requests"]/["limits"]) to
defensively parse net: either use str.partition("/") and take the second part if
present, or check for "/" and skip or use the whole string as the nic_name
fallback; ensure you do not crash on malformed entries and still populate
resources keys only when a valid nic_name is obtained (reference
resolve_cni_networks() and upstream get_network_attachment_definitions() for
expected format).
In `@tests/systems/kubernetes/test_system.py`:
- Line 20: Replace typing.Dict and typing.List with the built-in generic types
to modernize annotations: update the import line that currently imports "Dict,
List" from typing to remove them (keep ClassVar if still used) and change all
annotations that use Dict[...] and List[...] to dict[...] and list[...],
including the function signature that currently uses Dict/List so it matches the
file's existing use of list[str] and other built-in generics.
In `@tests/workloads/nccl_test/test_json_gen_strategy_kubernetes.py`:
- Around line 180-185: The test test_launcher_never_gets_nic_resources uses a
startswith check that assumes NIC resource keys begin with "nic"—make the
assertion explicit by deriving expected resource keys from CNI_NETS (e.g., map
each entry in CNI_NETS like "namespace/nic-name" to the resource key
"nvidia.com/{nic-name}") and assert none of those exact resource keys appear in
launcher_resources["requests"]; locate this logic around gen_json, CNI_NETS,
payload and launcher_resources in the test and replace the startswith-based
any(...) check with an exact-match check against the constructed keys.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 49efdc8e-0540-423b-bc2b-345785e56f2b
📒 Files selected for processing (5)
src/cloudai/systems/kubernetes/kubernetes_system.pysrc/cloudai/workloads/nccl_test/kubernetes_json_gen_strategy.pytests/conftest.pytests/systems/kubernetes/test_system.pytests/workloads/nccl_test/test_json_gen_strategy_kubernetes.py
Summary
Support CNI spec for NCCL over k8s.
Test Plan
Additional Notes
–