Skip to content

Add vsock control plane: types, Firecracker support, host client, and facade integration#50

Open
aniketmaurya wants to merge 6 commits intomainfrom
codex/github-mention-feature]-add-vsock-based-control-plane-for
Open

Add vsock control plane: types, Firecracker support, host client, and facade integration#50
aniketmaurya wants to merge 6 commits intomainfrom
codex/github-mention-feature]-add-vsock-based-control-plane-for

Conversation

@aniketmaurya
Copy link
Collaborator

@aniketmaurya aniketmaurya commented Mar 3, 2026

Motivation

  • Introduce a vsock-based control plane to allow host->guest command execution and health checks without relying on guest networking.
  • Provide first-class support for Firecracker vsock devices and a simple host client for request/response interactions.
  • Expose a vsock execution path in the high-level facade so callers can opt into or prefer vsock transport.

Description

  • Add a new host-side client VsockClient in src/smolvm/vsock.py that implements a simple framed JSON request/response protocol and health/exec operations.
  • Introduce VsockConfig to the public types in src/smolvm/types.py and wire it into VMConfig with sensible defaults (guest port 5000, enabled by default).
  • Extend the Firecracker API with FirecrackerClient.add_vsock(...) in src/smolvm/api.py and wire vsock provisioning into the VM startup path in src/smolvm/vm.py, including stable guest CID derivation via derive_vsock_guest_cid and resolution via resolve_vsock_config.
  • Add high-level facade integration in src/smolvm/facade.py with run_vsock(...) and vsock_health() helpers, and export VsockClient/VsockConfig from the package __init__.

Testing

  • Added unit tests in tests/test_vsock.py that cover the FirecrackerClient.add_vsock call, derive_vsock_guest_cid stability/non-reserved behavior, and resolve_vsock_config defaults.
  • Ran pytest tests/test_vsock.py and the new tests succeeded.

Codex Task

Summary by CodeRabbit

  • New Features

    • Vsock control-plane support for VM communication
    • Run commands in running VMs and query in-guest agent health
    • Vsock settings exposed in VM configuration and retrievable via management API
    • Host-side vsock client available in the public SDK
  • Documentation

    • Added comprehensive vsock control-plane implementation plan
  • Tests

    • Added tests for vsock setup, CID derivation, and config resolution

@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 22ec1968-9cde-4c6d-8dda-7ef9f5c4f9e1

📥 Commits

Reviewing files that changed from the base of the PR and between a932dc6 and a3cd6cb.

📒 Files selected for processing (1)
  • src/smolvm/__init__.py

📝 Walkthrough

Walkthrough

Adds a vsock-based control plane: new VsockConfig and VMConfig field, a host-side VsockClient, Firecracker vsock API call, VM lifecycle hooks and helpers, facade methods for executing/health-checking via vsock, tests, and a planning document describing phased implementation.

Changes

Cohort / File(s) Summary
Documentation
docs-vsock-control-plane-implementation-plan.md
New implementation plan describing architecture, phased rollout (MVP → hardening), data-model extensions, integration points, testing matrix, and acceptance criteria.
Types / Config
src/smolvm/types.py
Adds VsockConfig model and adds vsock: VsockConfig to VMConfig.
Public API exports
src/smolvm/__init__.py
Exports VsockConfig and VsockClient on the package public surface.
Host vsock client
src/smolvm/vsock.py
New VsockClient class implementing AF_VSOCK request/response protocol, centralized error handling, health() and run() returning CommandResult.
Firecracker API
src/smolvm/api.py
Adds FirecrackerClient.add_vsock(vsock_id, guest_cid, uds_path) to configure a vsock device via HTTP PUT /vsock/{id}.
VM helpers & lifecycle
src/smolvm/vm.py
Adds derive_vsock_guest_cid() and resolve_vsock_config(); integrates vsock attach into VM start paths for QEMU and Firecracker; adds SmolVMManager.get_vsock_config().
Facade / runtime
src/smolvm/facade.py
Adds _vsock state, run_vsock() and vsock_health() with lazy VsockClient initialization and VM state checks; resets _vsock on stop.
Tests
tests/test_vsock.py
New tests for FirecrackerClient.add_vsock, derive_vsock_guest_cid determinism/non-reserved CIDs, and resolve_vsock_config defaults.

Sequence Diagram

sequenceDiagram
    participant Host as Host (SmolVM)
    participant Hypervisor as Hypervisor (QEMU / Firecracker)
    participant Guest as Guest Agent

    Host->>Hypervisor: Attach vsock device (FirecrackerClient.add_vsock / QEMU attach)
    Hypervisor->>Guest: Guest boots and starts vsock listener

    Note over Host: Later — user calls SmolVM.run_vsock()
    Host->>Host: resolve_vsock_config(vm_id)
    Host->>Host: lazy init VsockClient(guest_cid, guest_port)
    Host->>Guest: Connect over AF_VSOCK and send JSON request (command, token, env, cwd, timeout)
    Guest->>Guest: Execute command, capture exit_code/stdout/stderr
    Guest->>Host: Send JSON response (ok/result or ok/exit_code, stdout, stderr)
    Host->>Host: Parse response -> CommandResult and return
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I tunneled bytes through a tiny sock,
Sent commands that hopped and never clock,
CIDs steady, tokens neat,
Host and guest in a cordial meet,
I nibble bugs and leave a treat. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.89% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: adding vsock control plane with types, Firecracker support, host client, and facade integration. It directly reflects the scope of the changeset.
Description check ✅ Passed The description includes Motivation, Description with implementation details, and Testing sections. However, it lacks the template's checklist and Related Issues sections.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/github-mention-feature]-add-vsock-based-control-plane-for

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
src/smolvm/vm.py (1)

60-65: Add collision handling on derived guest CIDs.

derive_vsock_guest_cid() is deterministic, but it doesn’t check host-level CID uniqueness. A low-probability hash collision would be operationally severe. Add reservation/check-and-retry against persisted VM state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/smolvm/vm.py` around lines 60 - 65, derive_vsock_guest_cid(vm_id: str)
currently returns a deterministic CID but never verifies that the CID is unused,
which risks collisions; update derive_vsock_guest_cid to loop: compute the
candidate CID as now, then check against the persisted VM registry/state (e.g.
VMStore/VMRegistry methods that list or lookup assigned CIDs) and if it's
already taken, re-derive by altering the input (e.g. append a retry counter or
use next 4 bytes of the hash) and retry a bounded number of times; once an
unused CID is found, persist/reserve it atomically (use the existing
persistence/lock APIs for VM creation) and return it, and if retries are
exhausted raise a clear exception so callers can handle the failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs-vsock-control-plane-implementation-plan.md`:
- Line 149: Replace the awkward parenthetical in the sentence "CI includes at
least one vsock-enabled integration lane (can be optional initially, required
before GA)." with the tightened phrasing "optional initially, but required
before GA" so the line reads: CI includes at least one vsock-enabled integration
lane (optional initially, but required before GA). Locate the sentence by
searching for the original parenthetical text or the phrase "vsock-enabled
integration lane" and update that exact line.

In `@src/smolvm/api.py`:
- Around line 247-254: The PUT request for configuring vsock uses the wrong URL;
update the call to self._request in the code that passes
vsock_id/guest_cid/uds_path so it targets "/vsock" (not f"/vsock/{vsock_id}"),
i.e., remove the path parameter and keep vsock_id only in the JSON payload sent
by the _request call (locate the block where self._request is invoked with
method "PUT" and the json containing vsock_id, guest_cid, uds_path).

In `@src/smolvm/facade.py`:
- Around line 324-325: Remove the duplicate assignment to self._vsock (the
second occurrence of "self._vsock = None") so only a single initialization
remains; locate the repeated lines where self._vsock is set (both occurrences
shown) and delete the redundant one to avoid the needless duplicate reset.
- Around line 426-433: The code should fail fast if the active backend does not
support vsock: inside the vsock acquisition in facade (where self._vsock is set
using self._sdk.get_vsock_config(self._vm_id) and VsockClient is constructed),
add an explicit capability check for the backend (e.g. query the SDK for
“backend supports vsock” for this VM or inspect a provided capability flag on
vsock_cfg) and raise SmolVMError when the backend cannot provide a vsock
transport instead of proceeding; apply the same guard to the second identical
block referenced (lines 443-450) so both places validate backend capability
before constructing VsockClient or calling self._vsock.run.

In `@src/smolvm/vm.py`:
- Around line 801-804: get_vsock_config currently always calls
resolve_vsock_config and returns an enabled-looking VsockConfig even for
backends that don't support vsock; change it to first inspect the VM's backend
(from vm_info.backend or vm_info["backend"]) and only call resolve_vsock_config
if that backend is in the set of vsock-capable backends (e.g., "firecracker",
"qemu", whatever backends your codebase treats as supporting vsock); otherwise
return a disabled VsockConfig (e.g., VsockConfig(enabled=False) or equivalent
sentinel) so callers won’t try to provision an unusable transport. Update
get_vsock_config to reference vm_info and resolve_vsock_config accordingly.

In `@src/smolvm/vsock.py`:
- Around line 75-93: The guest command timeout passed to run(...) can be longer
than the socket I/O timeout (self.timeout), causing premature socket-level
timeouts; update run(self, ... timeout=...) so the socket timeout used by
self._request is at least the command timeout (e.g. compute socket_timeout =
max(self.timeout, timeout) and either pass it into _request or temporarily set
self.timeout to that value before calling self._request, then restore if
mutated); reference run and self._request to locate the change and ensure the
I/O layer uses the larger timeout.
- Around line 41-45: The socket connect/send/recv block in the method that uses
socket.socket(AF_VSOCK) (the code that calls sock.connect((self.guest_cid,
self.guest_port)), sock.sendall(raw), and self._recv_line(sock)) must catch
OSError and socket.timeout and re-raise them as SmolVMError to keep SDK
exception consistency; wrap the entire with-socket block in a try/except that
captures OSError and socket.timeout (including exceptions from sock.connect,
sock.sendall, and self._recv_line) and raise SmolVMError(...) with a clear
message and the original exception attached or chained.
- Around line 59-70: _recv_line currently accumulates chunks until newline/EOF
with no size bound; add a maximum byte limit (e.g. MAX_LINE_BYTES constant) and
track total_bytes while reading in _recv_line, and if total_bytes exceeds the
limit raise an exception (ValueError or custom ProtocolError) to abort the read;
also ensure you stop reading and return/raise when the limit is hit before
attempting to decode, and document the limit where _recv_line is defined so
callers know lines over the cap will error.

---

Nitpick comments:
In `@src/smolvm/vm.py`:
- Around line 60-65: derive_vsock_guest_cid(vm_id: str) currently returns a
deterministic CID but never verifies that the CID is unused, which risks
collisions; update derive_vsock_guest_cid to loop: compute the candidate CID as
now, then check against the persisted VM registry/state (e.g. VMStore/VMRegistry
methods that list or lookup assigned CIDs) and if it's already taken, re-derive
by altering the input (e.g. append a retry counter or use next 4 bytes of the
hash) and retry a bounded number of times; once an unused CID is found,
persist/reserve it atomically (use the existing persistence/lock APIs for VM
creation) and return it, and if retries are exhausted raise a clear exception so
callers can handle the failure.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ecf14e and b386719.

📒 Files selected for processing (8)
  • docs-vsock-control-plane-implementation-plan.md
  • src/smolvm/__init__.py
  • src/smolvm/api.py
  • src/smolvm/facade.py
  • src/smolvm/types.py
  • src/smolvm/vm.py
  • src/smolvm/vsock.py
  • tests/test_vsock.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
src/smolvm/api.py (1)

247-250: ⚠️ Potential issue | 🔴 Critical

Use Firecracker’s PUT /vsock endpoint (not /vsock/{vsock_id}).

Line 249 targets a path form Firecracker does not accept for vsock device configuration, which will fail at runtime. This also means the expectation in tests/test_vsock.py (Line 19) should be updated with this fix.

Suggested fix
         self._request(
             "PUT",
-            f"/vsock/{vsock_id}",
+            "/vsock",
             json={
                 "vsock_id": vsock_id,
                 "guest_cid": guest_cid,
                 "uds_path": str(uds_path),
             },
         )
In the official Firecracker API docs/source, what is the correct endpoint for configuring virtio-vsock: `PUT /vsock` or `PUT /vsock/{vsock_id}`?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/smolvm/api.py` around lines 247 - 250, The PUT request for configuring
virtio-vsock is using the wrong path; change the API call in the method that
invokes self._request (the block that currently does self._request("PUT",
f"/vsock/{vsock_id}", json=...)) to use "PUT" with the "/vsock" endpoint (remove
the path parameter) and adjust the payload accordingly; also update the test
assertion in tests/test_vsock.py (the expectation on Line 19) to expect a PUT to
"/vsock" instead of "/vsock/{vsock_id}" so tests match the Firecracker API.
src/smolvm/facade.py (1)

324-325: ⚠️ Potential issue | 🟡 Minor

Remove duplicated _vsock reset in stop().

Line 325 duplicates Line 324 and can be removed.

Suggested fix
         self._ssh = None
         self._ssh_ready = False
         self._vsock = None
-        self._vsock = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/smolvm/facade.py` around lines 324 - 325, In the stop() method of the
Facade (src/smolvm/facade.py) there's a duplicated reset of the attribute
self._vsock; remove the redundant second occurrence so self._vsock = None only
appears once in stop(), leaving other teardown logic untouched.
src/smolvm/vsock.py (2)

35-44: ⚠️ Potential issue | 🟠 Major

Align socket I/O timeout with command timeout in run().

Line 91 passes guest execution timeout, but socket timeout remains self.timeout (Line 43). Long-running commands can fail prematurely at transport layer.

Suggested fix
-    def _request(self, payload: dict[str, Any]) -> dict[str, Any]:
+    def _request(self, payload: dict[str, Any], *, io_timeout: float | None = None) -> dict[str, Any]:
@@
-                sock.settimeout(self.timeout)
+                sock.settimeout(self.timeout if io_timeout is None else io_timeout)
@@
-        response = self._request(
+        io_timeout = max(self.timeout, float(timeout) + 5.0)
+        response = self._request(
             {
                 "op": "exec",
                 "command": command,
                 "timeout": timeout,
                 "env": env or {},
                 "cwd": cwd,
                 "token": token,
-            }
+            },
+            io_timeout=io_timeout,
         )

Also applies to: 87-96

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/smolvm/vsock.py` around lines 35 - 44, The socket timeout in _request
currently uses self.timeout and can cause transport-layer timeouts for
long-running commands invoked via run(); change _request to accept a timeout
parameter (e.g., def _request(self, payload: dict[str, Any], timeout: float))
and use that value in sock.settimeout(...) instead of self.timeout, then update
all callers (notably run()) to pass the guest execution timeout it already
computes/passes to the guest (the same timeout used at the command execution
site) so socket I/O aligns with command timeout.

62-73: ⚠️ Potential issue | 🟠 Major

Bound _recv_line() payload size to prevent memory blowups.

On Line 63+, the read loop accumulates unbounded bytes until newline/EOF. A misbehaving peer can force excessive memory growth.

Suggested fix
+MAX_LINE_BYTES = 1024 * 1024  # 1 MiB
+
     `@staticmethod`
     def _recv_line(sock: socket.socket) -> str:
         chunks: list[bytes] = []
+        total = 0
         while True:
             chunk = sock.recv(4096)
             if not chunk:
                 break
             chunks.append(chunk)
+            total += len(chunk)
+            if total > MAX_LINE_BYTES:
+                raise SmolVMError("Vsock response exceeded maximum size")
             if b"\n" in chunk:
                 break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/smolvm/vsock.py` around lines 62 - 73, _recv_line currently accumulates
bytes until newline/EOF with no limit; add a hard cap (e.g. MAX_LINE_BYTES) and
track total_bytes read inside _recv_line, and if total_bytes exceeds that cap
raise a specific exception (or return an error) to avoid memory blowups; when
enforcing the cap, preserve semantics of splitting at the first b"\n" (use the
same join/split logic) and apply the check after appending each chunk so a
malicious peer can't force unlimited growth.
src/smolvm/vm.py (1)

801-804: ⚠️ Potential issue | 🟠 Major

Make get_vsock_config() backend-aware.

Lines 801-804 always return resolved/enabled-looking vsock settings. For backends that do not provision vsock, this causes callers to attempt unusable transport instead of failing early.

Suggested fix
 def get_vsock_config(self, vm_id: str) -> VsockConfig:
     """Return resolved vsock settings for a VM."""
     vm_info = self.get(vm_id)
+    backend = self._backend_for_vm(vm_info)
+    if backend != BACKEND_FIRECRACKER:
+        return vm_info.config.vsock.model_copy(update={"enabled": False})
     return resolve_vsock_config(vm_info, self.data_dir)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/smolvm/vm.py` around lines 801 - 804, get_vsock_config currently always
calls resolve_vsock_config which yields an enabled-looking VsockConfig even for
backends that don't provision vsock; update get_vsock_config(vm_id: str) to
inspect the VM's backend (vm_info.backend or vm_info.get("backend")) and only
call resolve_vsock_config when that backend is known to support vsock; for
non-vsock backends return an explicit disabled value (e.g., a VsockConfig with
enabled=False or None per the project's convention) so callers won't attempt
unusable vsock transports; reference get_vsock_config, resolve_vsock_config and
the Vm info's backend field in your change.
🧹 Nitpick comments (2)
src/smolvm/vm.py (1)

648-654: Create parent directories for custom uds_path before add_vsock().

If a user sets vsock_cfg.uds_path under a non-existent directory, attach fails even though config is otherwise valid.

Suggested fix
                 assert vsock_cfg.guest_cid is not None
                 assert vsock_cfg.uds_path is not None
+                vsock_cfg.uds_path.parent.mkdir(parents=True, exist_ok=True)
                 if vsock_cfg.uds_path.exists():
                     self._unlink_socket(vsock_cfg.uds_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/smolvm/vm.py` around lines 648 - 654, The code currently assumes the
parent directory for vsock_cfg.uds_path exists before calling client.add_vsock
(in the block around self._unlink_socket and client.add_vsock), causing attach
to fail for custom paths; fix by ensuring the parent directories for
vsock_cfg.uds_path are created (mkdir parents=True, exist_ok=True) before
attempting to unlink or call client.add_vsock, and handle potential filesystem
errors (raise/log) so failure is clear; update the logic around
self._unlink_socket and client.add_vsock to create the directory from
vsock_cfg.uds_path.parent first.
src/smolvm/facade.py (1)

426-432: Extract shared vsock-client initialization into one helper.

run_vsock() and vsock_health() duplicate the same lazy-init/config checks, which increases drift risk.

Suggested refactor direction
+    def _ensure_vsock(self) -> VsockClient:
+        if self._vsock is None:
+            vsock_cfg = self._sdk.get_vsock_config(self._vm_id)
+            if not vsock_cfg.enabled:
+                raise SmolVMError("Vsock control-plane is disabled for this VM")
+            assert vsock_cfg.guest_cid is not None
+            self._vsock = VsockClient(vsock_cfg.guest_cid, vsock_cfg.guest_port)
+        return self._vsock
@@
-        if self._vsock is None:
-            ...
-        return self._vsock.run(...)
+        return self._ensure_vsock().run(...)
@@
-        if self._vsock is None:
-            ...
-        return self._vsock.health()
+        return self._ensure_vsock().health()

Also applies to: 443-449

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/smolvm/facade.py` around lines 426 - 432, Extract the duplicated
lazy-init and config validation into a single helper (e.g., a private method
like _ensure_vsock) and call it from run_vsock() and vsock_health(); the helper
should check self._vsock, call self._sdk.get_vsock_config(self._vm_id), raise
SmolVMError if vsock_cfg.enabled is False, assert guest_cid is not None, and set
self._vsock = VsockClient(vsock_cfg.guest_cid, vsock_cfg.guest_port) so both
methods reuse the same initialization path and remove the duplicated code
blocks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/smolvm/types.py`:
- Around line 43-44: The guest_cid field currently allows reserved/invalid vsock
CIDs and should be validated at parse time: change the declaration of guest_cid
to use an Annotated int with a Field constraint limiting it to the valid guest
CID range (e.g., ge=3 and an appropriate upper bound such as le=4294967294 or
similar for 32‑bit CID space) and keep it nullable (None default) so invalid
values like 0,1,2 are rejected by pydantic during parsing; update the guest_cid
declaration (symbol: guest_cid) to Annotated[int, Field(ge=3, le=<upper>)] |
None = None to enforce this constraint.

---

Duplicate comments:
In `@src/smolvm/api.py`:
- Around line 247-250: The PUT request for configuring virtio-vsock is using the
wrong path; change the API call in the method that invokes self._request (the
block that currently does self._request("PUT", f"/vsock/{vsock_id}", json=...))
to use "PUT" with the "/vsock" endpoint (remove the path parameter) and adjust
the payload accordingly; also update the test assertion in tests/test_vsock.py
(the expectation on Line 19) to expect a PUT to "/vsock" instead of
"/vsock/{vsock_id}" so tests match the Firecracker API.

In `@src/smolvm/facade.py`:
- Around line 324-325: In the stop() method of the Facade (src/smolvm/facade.py)
there's a duplicated reset of the attribute self._vsock; remove the redundant
second occurrence so self._vsock = None only appears once in stop(), leaving
other teardown logic untouched.

In `@src/smolvm/vm.py`:
- Around line 801-804: get_vsock_config currently always calls
resolve_vsock_config which yields an enabled-looking VsockConfig even for
backends that don't provision vsock; update get_vsock_config(vm_id: str) to
inspect the VM's backend (vm_info.backend or vm_info.get("backend")) and only
call resolve_vsock_config when that backend is known to support vsock; for
non-vsock backends return an explicit disabled value (e.g., a VsockConfig with
enabled=False or None per the project's convention) so callers won't attempt
unusable vsock transports; reference get_vsock_config, resolve_vsock_config and
the Vm info's backend field in your change.

In `@src/smolvm/vsock.py`:
- Around line 35-44: The socket timeout in _request currently uses self.timeout
and can cause transport-layer timeouts for long-running commands invoked via
run(); change _request to accept a timeout parameter (e.g., def _request(self,
payload: dict[str, Any], timeout: float)) and use that value in
sock.settimeout(...) instead of self.timeout, then update all callers (notably
run()) to pass the guest execution timeout it already computes/passes to the
guest (the same timeout used at the command execution site) so socket I/O aligns
with command timeout.
- Around line 62-73: _recv_line currently accumulates bytes until newline/EOF
with no limit; add a hard cap (e.g. MAX_LINE_BYTES) and track total_bytes read
inside _recv_line, and if total_bytes exceeds that cap raise a specific
exception (or return an error) to avoid memory blowups; when enforcing the cap,
preserve semantics of splitting at the first b"\n" (use the same join/split
logic) and apply the check after appending each chunk so a malicious peer can't
force unlimited growth.

---

Nitpick comments:
In `@src/smolvm/facade.py`:
- Around line 426-432: Extract the duplicated lazy-init and config validation
into a single helper (e.g., a private method like _ensure_vsock) and call it
from run_vsock() and vsock_health(); the helper should check self._vsock, call
self._sdk.get_vsock_config(self._vm_id), raise SmolVMError if vsock_cfg.enabled
is False, assert guest_cid is not None, and set self._vsock =
VsockClient(vsock_cfg.guest_cid, vsock_cfg.guest_port) so both methods reuse the
same initialization path and remove the duplicated code blocks.

In `@src/smolvm/vm.py`:
- Around line 648-654: The code currently assumes the parent directory for
vsock_cfg.uds_path exists before calling client.add_vsock (in the block around
self._unlink_socket and client.add_vsock), causing attach to fail for custom
paths; fix by ensuring the parent directories for vsock_cfg.uds_path are created
(mkdir parents=True, exist_ok=True) before attempting to unlink or call
client.add_vsock, and handle potential filesystem errors (raise/log) so failure
is clear; update the logic around self._unlink_socket and client.add_vsock to
create the directory from vsock_cfg.uds_path.parent first.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ecf14e and 721acfd.

📒 Files selected for processing (8)
  • docs-vsock-control-plane-implementation-plan.md
  • src/smolvm/__init__.py
  • src/smolvm/api.py
  • src/smolvm/facade.py
  • src/smolvm/types.py
  • src/smolvm/vm.py
  • src/smolvm/vsock.py
  • tests/test_vsock.py

@aniketmaurya
Copy link
Collaborator Author

@copilot please resolve the conflict in this PR

Copy link
Contributor

Copilot AI commented Mar 8, 2026

@aniketmaurya I've opened a new pull request, #55, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 4 commits March 8, 2026 10:59
* Initial plan

* Merge main branch changes: extra_drives, rate_limit_mbps, remove_egress_rules, and other upstream updates

Co-authored-by: aniketmaurya <21018714+aniketmaurya@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: aniketmaurya <21018714+aniketmaurya@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants