Introduce configuration to utilize nvidia GPU on dockerIM #494

0405ysj · 2025-11-05T04:53:19Z

Context: b/455678690

With configuring Cloud Orchestrator as below, Cloud Orchestrator with dockerIM can utilize nvidia GPU.

[InstanceManager.Docker]
GpuManufacturer = "nvidia"

pkg/app/instances/docker.go

jemoreira · 2025-11-05T19:32:50Z

cmd/cloud_orchestrator/main.go

-		im = instances.NewDockerInstanceManager(config.InstanceManager, cli)
+		im, err = instances.NewDockerInstanceManager(config.InstanceManager, cli)
+		if err != nil {
+			log.Fatal("Failed to create Docker Instance Manager: ", err)


This function should return (instances.Manager, error) if it can fail like this.

I used log.Fatal to keep consistency, as other place in this file does. Would you want me to refactor here?

ser-io

For consistency with GCE counterpart and flexibility make the use of accelerators (gpu) part of the requests first. You can add configuration support later if needed.

Use #319 as reference.

ser-io · 2025-11-05T20:09:40Z

pkg/app/instances/docker.go


 type DockerIMConfig struct {
 	DockerImageName      string
+	GpuManufacturer      string


Avoid terms like manufacturer that are not part of the docker documentation https://docs.docker.com/desktop/features/gpu. Try to use similar names used in the documentation that docker users are already familiar with

My suggestion is equivalant to --env NVIDIA_DRIVER_CAPABILITIES=all --gpus all --runtime nvidia in terms of executing docker run, and I don't think there's a proper name to represent my purpose on docker documentation.

--env and --runtime looks fine to expose into CO configuration, but --gpus in docker run directly specifies GPU allocation. I think --gpus shouldn't be exposed into CO configuration, since DockerIM runs multiple docker instances and we probably want to consider GPU allocation by CO later. I don't want to design how DockerIM allocates GPU right now, as it's pretty complicated to consider details of nvidia GPUs.

So, I need to define a new name to pass information whether DockerIM will utilize GPU or not. Retrieving such information by parsing --env or --runtime is appropriate. Or, considering boolean configuration such as UseNvidiaGpu looks valid to me. If CO configuration for GPU is representative enough to set --env or --runtime, I think we don't need to define new configurations in advance which can make compatibility issue in the far future.

Let's solve #494 (comment) first.

ser-io · 2025-11-06T15:02:22Z

pkg/app/instances/docker.go


 type DockerIMConfig struct {
 	DockerImageName      string
+	GpuManufacturer      string


Let's solve #494 (comment) first.

ser-io · 2025-11-06T23:22:21Z

pkg/app/instances/docker.go


 type DockerIMConfig struct {
 	DockerImageName      string
+	GpuManufacturer      string


The ability to create docker instances using GPU should be part of the CO public API, not hidden as a CO configuration. Please explain your case why do you want to hide this ability from the end users.

For reference, the ability to add accelerators is part of the public API for GCE hosts. See #319. Also the gcloud and docker CLIs follow the same principle. Going the opposite way here should be properly justified.

I cannot say it's under same principle because of the way how docker run --runtime works. Valid values of --runtime flag relies on the dockerd configuration setup. At least, this isn't proper to be exposed into cvdr CLI, but should be in CO configuration.

$ sudo nvidia-ctk runtime configure --runtime=docker # Modifies /etc/docker/daemon.json with adding a new runtime. $ cat /etc/docker/daemon.json { "runtimes": { "nvidia": { "args": [], "path": "nvidia-container-runtime" } } } $ sudo systemctl restart docker # Then users can execute `docker run --runtime nvidia [args]`

On the other hand, I think it's a bit complicated to make agreement from here... I'll propose a design around GPU utilization when I have time, perhaps with GPU allocation mechanism too.

Sounds good.

0405ysj force-pushed the gpu branch from 6559264 to 8044484 Compare November 5, 2025 05:43

0405ysj marked this pull request as ready for review November 5, 2025 05:52

0405ysj requested review from Databean, adelva1984, jemoreira, jmacnak, rmuthiah and ser-io as code owners November 5, 2025 05:52

0405ysj requested review from ikicha and k311093 and removed request for Databean, adelva1984 and rmuthiah November 5, 2025 05:53

ikicha reviewed Nov 5, 2025

View reviewed changes

pkg/app/instances/docker.go Show resolved Hide resolved

Introduce configuration to utilize nvidia GPU on dockerIM

ca95ea6

0405ysj force-pushed the gpu branch from 8044484 to ca95ea6 Compare November 5, 2025 07:43

0405ysj requested a review from ikicha November 5, 2025 07:44

ikicha approved these changes Nov 5, 2025

View reviewed changes

jemoreira approved these changes Nov 5, 2025

View reviewed changes

ser-io requested changes Nov 5, 2025

View reviewed changes

0405ysj requested a review from ser-io November 6, 2025 02:27

ser-io requested changes Nov 6, 2025

View reviewed changes

0405ysj marked this pull request as draft November 7, 2025 05:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce configuration to utilize nvidia GPU on dockerIM #494

Introduce configuration to utilize nvidia GPU on dockerIM #494

Uh oh!

0405ysj commented Nov 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

jemoreira Nov 5, 2025

Uh oh!

0405ysj Nov 6, 2025

Uh oh!

ser-io left a comment •

edited

Loading

Uh oh!

ser-io Nov 5, 2025

Uh oh!

0405ysj Nov 6, 2025

Uh oh!

ser-io Nov 6, 2025 •

edited

Loading

Uh oh!

ser-io Nov 6, 2025 •

edited

Loading

Uh oh!

ser-io Nov 6, 2025 •

edited

Loading

Uh oh!

0405ysj Nov 7, 2025

Uh oh!

ser-io Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Introduce configuration to utilize nvidia GPU on dockerIM #494

Are you sure you want to change the base?

Introduce configuration to utilize nvidia GPU on dockerIM #494

Uh oh!

Conversation

0405ysj commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jemoreira Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

0405ysj Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

ser-io left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ser-io Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

0405ysj Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

ser-io Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ser-io Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ser-io Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0405ysj Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

ser-io Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

0405ysj commented Nov 5, 2025 •

edited

Loading

ser-io left a comment •

edited

Loading

ser-io Nov 6, 2025 •

edited

Loading

ser-io Nov 6, 2025 •

edited

Loading

ser-io Nov 6, 2025 •

edited

Loading