Skip to content

Conversation

@justus-camp-microsoft
Copy link
Contributor

Since MANA-keepalive isn't going to save/restore all of its queues, we need to have access to both a persistent and non-persistent DMA client. The NVMe side is written to always use the persisted allocator if present and the MANA side is written to always use the ephemeral allocator (#2123 will change this to use the persistent allocator for GDMA queues)

Copilot AI review requested due to automatic review settings October 28, 2025 23:19
@justus-camp-microsoft justus-camp-microsoft requested review from a team as code owners October 28, 2025 23:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the DMA client architecture in the user driver infrastructure by splitting a single dma_client into two distinct types: ephemeral_dma_client for temporary memory allocations and persistent_dma_client for memory that persists across device servicing/save-restore operations.

Key changes:

  • Modified the DeviceBacking trait to expose two separate DMA client methods instead of one
  • Updated device implementations (VfioDevice, EmulatedDevice) to maintain both ephemeral and persistent DMA clients
  • Modified NVMe driver initialization in OpenHCL to create separate DMA clients with appropriate persistence settings

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
vm/devices/user_driver/src/lib.rs Updated DeviceBacking trait to replace single dma_client() method with ephemeral_dma_client() and persistent_dma_client() methods
vm/devices/user_driver/src/vfio.rs Modified VfioDevice to store and manage both ephemeral and persistent DMA clients
vm/devices/user_driver_emulated_mock/src/lib.rs Updated EmulatedDevice mock implementation to support both DMA client types
vm/devices/storage/disk_nvme/nvme_driver/src/driver.rs Modified restore logic to explicitly require persistent DMA client for memory restoration
vm/devices/storage/disk_nvme/nvme_driver/src/queue_pair.rs Updated queue allocation to prefer persistent DMA client, falling back to ephemeral
vm/devices/storage/disk_nvme/nvme_driver/src/tests.rs Updated test wrapper to implement new trait methods
vm/devices/storage/disk_nvme/nvme_driver/fuzz/fuzz_emulated_device.rs Updated fuzz test device wrapper to implement new trait methods
vm/devices/storage/storage_tests/tests/scsidvd_nvme.rs Updated test to pass None for persistent DMA client
vm/devices/net/net_mana/src/lib.rs Updated MANA device tests to pass None for persistent DMA client
vm/devices/net/mana_driver/src/tests.rs Updated MANA driver tests to use ephemeral DMA client for allocations
vm/devices/net/mana_driver/src/mana.rs Updated MANA vport to use ephemeral DMA client for runtime allocations
vm/devices/net/mana_driver/src/gdma_driver.rs Updated GDMA driver to use ephemeral DMA client for buffer allocation
openhcl/underhill_core/src/nvme_manager/device.rs Modified NVMe device spawning to create separate ephemeral and persistent DMA clients with proper configuration
openhcl/underhill_core/src/emuplat/netvsp.rs Updated MANA device creation to pass None for persistent DMA client

fn ephemeral_dma_client(&self) -> Arc<dyn DmaClient> {
self.device.ephemeral_dma_client()
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure whatever in this fuzzer is calling this device covers both these methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattkur, do you have background on this fuzzer? Does this go through servicing/keepalive paths and enable/disable keepalive? I think as this PR is written we'll only exercise the ephemeral_dma_client() path unless we add a persistent client to its instantiation. What's the "right" thing to do here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guramritMS, could you take this question?

Copy link
Contributor

@gurasinghMS gurasinghMS Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the fuzzer go through servicing/keepalive paths and enable/disable keepalive? - No it does not. So it shouldn't be bothered by persistent v/s emphemeral client (functionality wise).

What is the "right" thing to do here: There are a couple of ways I can see this going:

  • Instantiate the object with just ephemeral client like we are doing now. This won't exercise the persistent client but we aren't really testing save/restore with the fuzzer right now anyways (but as steven mentioned we wanna stress both clients).
  • Do what @justus-camp-microsoft mentioned and add a persistent client to the instantiation of FuzzEmulatedDevice. - I would vote for this option with the addition that persistent_dma_client() should invoke arbitrary_data() to get a bool allowing calls to persistent_dma_client to arbitrarily fail. This will allow us to use to a mix of persistent and ephemeral dma in the fuzzing.

P.S. This might require some updates to DeviceTestMemory too

};
let dma_client = device.dma_client();

let dma_client = device
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this needs to align with a nvme driver change to know that if we fallback, we cannot persist anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I think we want to maintain determinism. The NVMe driver should only use the persistent allocator (when private pool is in the environment).

};
let dma_client = device.dma_client();

let dma_client = device
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we wanna save whether this holds persistent v/s ephemeral memory and then fail in QueuePair::save() if it is using ephemeral memory?

Comment on lines 220 to 222
let dma_client = device
.persistent_dma_client()
.unwrap_or(device.ephemeral_dma_client());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pragmatically this behaves the same as what is checked in already. But, for NVMe, I think we want to control which DMA allocator is used at the nvme_manager layer. That is the component controlling policy. Ideally ephemeral_dma_client is None when we have a private pool, for example.

@github-actions github-actions bot added the unsafe Related to unsafe code label Oct 31, 2025
@github-actions
Copy link

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

unsafe Related to unsafe code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants