-
Notifications
You must be signed in to change notification settings - Fork 158
dma_client: split dma_client into ephemeral and persistent clients #2296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
dma_client: split dma_client into ephemeral and persistent clients #2296
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the DMA client architecture in the user driver infrastructure by splitting a single dma_client into two distinct types: ephemeral_dma_client for temporary memory allocations and persistent_dma_client for memory that persists across device servicing/save-restore operations.
Key changes:
- Modified the
DeviceBackingtrait to expose two separate DMA client methods instead of one - Updated device implementations (
VfioDevice,EmulatedDevice) to maintain both ephemeral and persistent DMA clients - Modified NVMe driver initialization in OpenHCL to create separate DMA clients with appropriate persistence settings
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| vm/devices/user_driver/src/lib.rs | Updated DeviceBacking trait to replace single dma_client() method with ephemeral_dma_client() and persistent_dma_client() methods |
| vm/devices/user_driver/src/vfio.rs | Modified VfioDevice to store and manage both ephemeral and persistent DMA clients |
| vm/devices/user_driver_emulated_mock/src/lib.rs | Updated EmulatedDevice mock implementation to support both DMA client types |
| vm/devices/storage/disk_nvme/nvme_driver/src/driver.rs | Modified restore logic to explicitly require persistent DMA client for memory restoration |
| vm/devices/storage/disk_nvme/nvme_driver/src/queue_pair.rs | Updated queue allocation to prefer persistent DMA client, falling back to ephemeral |
| vm/devices/storage/disk_nvme/nvme_driver/src/tests.rs | Updated test wrapper to implement new trait methods |
| vm/devices/storage/disk_nvme/nvme_driver/fuzz/fuzz_emulated_device.rs | Updated fuzz test device wrapper to implement new trait methods |
| vm/devices/storage/storage_tests/tests/scsidvd_nvme.rs | Updated test to pass None for persistent DMA client |
| vm/devices/net/net_mana/src/lib.rs | Updated MANA device tests to pass None for persistent DMA client |
| vm/devices/net/mana_driver/src/tests.rs | Updated MANA driver tests to use ephemeral DMA client for allocations |
| vm/devices/net/mana_driver/src/mana.rs | Updated MANA vport to use ephemeral DMA client for runtime allocations |
| vm/devices/net/mana_driver/src/gdma_driver.rs | Updated GDMA driver to use ephemeral DMA client for buffer allocation |
| openhcl/underhill_core/src/nvme_manager/device.rs | Modified NVMe device spawning to create separate ephemeral and persistent DMA clients with proper configuration |
| openhcl/underhill_core/src/emuplat/netvsp.rs | Updated MANA device creation to pass None for persistent DMA client |
| fn ephemeral_dma_client(&self) -> Arc<dyn DmaClient> { | ||
| self.device.ephemeral_dma_client() | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make sure whatever in this fuzzer is calling this device covers both these methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattkur, do you have background on this fuzzer? Does this go through servicing/keepalive paths and enable/disable keepalive? I think as this PR is written we'll only exercise the ephemeral_dma_client() path unless we add a persistent client to its instantiation. What's the "right" thing to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guramritMS, could you take this question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the fuzzer go through servicing/keepalive paths and enable/disable keepalive? - No it does not. So it shouldn't be bothered by persistent v/s emphemeral client (functionality wise).
What is the "right" thing to do here: There are a couple of ways I can see this going:
- Instantiate the object with just ephemeral client like we are doing now. This won't exercise the persistent client but we aren't really testing save/restore with the fuzzer right now anyways (but as steven mentioned we wanna stress both clients).
- Do what @justus-camp-microsoft mentioned and add a persistent client to the instantiation of
FuzzEmulatedDevice. - I would vote for this option with the addition thatpersistent_dma_client()should invokearbitrary_data()to get a bool allowing calls topersistent_dma_clientto arbitrarily fail. This will allow us to use to a mix of persistent and ephemeral dma in the fuzzing.
P.S. This might require some updates to DeviceTestMemory too
| }; | ||
| let dma_client = device.dma_client(); | ||
|
|
||
| let dma_client = device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this needs to align with a nvme driver change to know that if we fallback, we cannot persist anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I think we want to maintain determinism. The NVMe driver should only use the persistent allocator (when private pool is in the environment).
| }; | ||
| let dma_client = device.dma_client(); | ||
|
|
||
| let dma_client = device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we wanna save whether this holds persistent v/s ephemeral memory and then fail in QueuePair::save() if it is using ephemeral memory?
| let dma_client = device | ||
| .persistent_dma_client() | ||
| .unwrap_or(device.ephemeral_dma_client()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pragmatically this behaves the same as what is checked in already. But, for NVMe, I think we want to control which DMA allocator is used at the nvme_manager layer. That is the component controlling policy. Ideally ephemeral_dma_client is None when we have a private pool, for example.
|
This PR modifies files containing For more on why we check whole files, instead of just diffs, check out the Rustonomicon |
Since MANA-keepalive isn't going to save/restore all of its queues, we need to have access to both a persistent and non-persistent DMA client. The NVMe side is written to always use the persisted allocator if present and the MANA side is written to always use the ephemeral allocator (#2123 will change this to use the persistent allocator for GDMA queues)