LLM: update to vllm version 0.11.0, upgrade to cuda 12.9.1 #831

ad-astra-video · 2025-10-28T10:51:18Z

Two small updates in this:

update to vllm 0.11.0.
update to cuda 12.9.1 for docker image matching to vllm docker build
ensures tensor and pipeline a parallel check does not fail to load pipeline
removes max_num_batched_tokens = max_model_len to enable usage of chunked_prefill on by default

livepeer-ai-llm  | INFO 10-29 11:01:00 [model.py:1510] Using max model len 200000
livepeer-ai-llm  | INFO 10-29 11:01:00 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=2048

… image

Copilot

Pull Request Overview

This PR updates the LLM dependencies and Docker environment to support vLLM 0.11.0. The changes primarily involve upgrading vLLM from version 0.10.0 to 0.11.0 and updating the Docker base image to use CUDA 12.9.1 to match vLLM's requirements.

Key Changes:

Updated vLLM dependency from 0.10.0 to 0.11.0
Upgraded Docker base image from CUDA 12.1.1 to CUDA 12.9.1 with Ubuntu 22.04
Modified Python package version constraints from pinned (==) to minimum versions (>=)

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
runner/requirements.llm.in	Updated vLLM version constraint from 0.10.0 to 0.11.0
runner/requirements.llm.txt	Regenerated dependency lockfile (entire file removed, will be regenerated)
runner/docker/Dockerfile.llm	Updated CUDA base image to 12.9.1 and changed package version constraints to use minimum versions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

runner/docker/Dockerfile.llm

ad-astra-video · 2025-10-28T10:52:01Z

@mikezupper when the new image builds for this PR can you test it by chance?

…efill fixes this and is on by default

LLM: update to vllm version 0.11.0, upgrade to cuda 12.8.1 for docker…

336e314

… image

Copilot AI review requested due to automatic review settings October 28, 2025 10:51

Copilot AI reviewed Oct 28, 2025

View reviewed changes

runner/docker/Dockerfile.llm Outdated Show resolved Hide resolved

ad-astra-video added 3 commits October 28, 2025 08:07

revert PIP_VERSION change

31a51b3

update tensor and pipeline parallelism confirmation to be more forgiving

4783f1f

remove max_num_batched_tokens check against max_model_len. chunked pr…

206a1a8

…efill fixes this and is on by default

ad-astra-video changed the title ~~LLM: update to vllm version 0.11.0, upgrade to cuda 12.8.1~~ LLM: update to vllm version 0.11.0, upgrade to cuda 12.9.1 Oct 28, 2025

add /chat/completions overload route for llm

6c40af8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLM: update to vllm version 0.11.0, upgrade to cuda 12.9.1 #831

LLM: update to vllm version 0.11.0, upgrade to cuda 12.9.1 #831

Uh oh!

ad-astra-video commented Oct 28, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

ad-astra-video commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LLM: update to vllm version 0.11.0, upgrade to cuda 12.9.1 #831

Are you sure you want to change the base?

LLM: update to vllm version 0.11.0, upgrade to cuda 12.9.1 #831

Uh oh!

Conversation

ad-astra-video commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

ad-astra-video commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ad-astra-video commented Oct 28, 2025 •

edited

Loading