-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Could you please provide information about the APIs and libraries defined within OPI?
I'd like to know if there are standardized APIs or methods/libraries for infrastructure administrators to prevent users from utilizing a specific DPU, for maintenance or other reasons.
For example, with NVIDIA GPUs, a vendor-specific command like nvidia-smi drain can be executed to prevent users from accessing the GPU.
In the case of DPUs, would simply shutting down the OS on the DPU be sufficient? While it might be possible to log in to the OS and execute a Linux shutdown command, are there standard APIs or libraries that allow us to remotely disable DPU usage?
Furthermore, with GPUs, it's necessary to stop any processes that have the GPU's device file open before executing commands like nvidia-smi drain. In the case of DPUs, it might be more complex and difficult to determine who or how the DPU is being utilized. Is a forced shutdown the only option in such situations?