-
-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Perf] Slight improvement of ITL with multiple GPUs
#31826
opened Jan 6, 2026 by
access2rohit
Loading…
2 of 5 tasks
[Model] Enable LoRA support for tower and connector in DotsOCR
documentation
Improvements or additions to documentation
#31825
opened Jan 6, 2026 by
ShaanveerS
Loading…
[CI] Add CUDA 13 nightly containers
ci/build
nvidia
#31822
opened Jan 6, 2026 by
csahithi
Loading…
5 tasks
[ROCm][CI] Fix ModernBERT token classification test numerical accuracy on ROCm
rocm
Related to AMD ROCm
#31820
opened Jan 6, 2026 by
AndreasKaratzas
Loading…
[Bugfix] Handle mistral tokenizer in get_hf_processor
multi-modality
Related to multi-modality (#4194)
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#31817
opened Jan 6, 2026 by
DarkLight1337
Loading…
5 tasks
[ROCm][AITER] bugfix accuracy regression in ROCM_AITER_TRITON_MLA backend
rocm
Related to AMD ROCm
v1
#31816
opened Jan 6, 2026 by
vllmellm
Loading…
5 tasks
[Bugfix] Fix TorchAO quantization bugs and add
--torchao-config CLI support
#31815
opened Jan 6, 2026 by
jwpark33
Loading…
5 tasks
[Bugfix] Inject JSON schema descriptions into prompt for structured outputs
frontend
#31814
opened Jan 6, 2026 by
ricky-chaoju
Loading…
Enable LoRA support for tower and connector in Mistral and Voxtral
deepseek
Related to DeepSeek models
documentation
Improvements or additions to documentation
frontend
qwen
Related to Qwen models
#31812
opened Jan 6, 2026 by
Anexdeus
Loading…
Report error log after vllm bench serve
performance
Performance-related issues
ready
ONLY add when PR is ready to merge/full CI is needed
#31808
opened Jan 6, 2026 by
elvircrn
Loading…
5 tasks
[Core][NIXL] Support HMA+NixlConnector
ci/build
deepseek
Related to DeepSeek models
documentation
Improvements or additions to documentation
frontend
kv-connector
llama
Related to Llama models
multi-modality
Related to multi-modality (#4194)
needs-rebase
nvidia
performance
Performance-related issues
qwen
Related to Qwen models
structured-output
tool-calling
tpu
Related to Google TPUs
v1
#31802
opened Jan 6, 2026 by
NickLucche
•
Draft
6 tasks
[Doc] Update release docs
documentation
Improvements or additions to documentation
#31799
opened Jan 6, 2026 by
DarkLight1337
Loading…
5 tasks
[Bugfix] Use isinstance() instead of type() in LoRA can_replace_layer
#31791
opened Jan 6, 2026 by
majiayu000
Loading…
[Chore] Try remove Trigger CI with all tests for wide-ranging PRs
tpu
Related to Google TPUs
v1
init_cached_hf_modules
ready-run-all-tests
#31786
opened Jan 6, 2026 by
DarkLight1337
Loading…
5 tasks
[Fix] Use torch.empty for output in attention+quant fusion
#31785
opened Jan 6, 2026 by
elvischenv
Loading…
5 tasks
[Kernel] Support bias type in grouped_topk kernel
#31781
opened Jan 6, 2026 by
xyang16
Loading…
5 tasks
fixing stream interval > 1 will cause tool call bug
#31778
opened Jan 6, 2026 by
MrIceCreamMan
Loading…
3 of 5 tasks
[docker] A follow-up patch to fix #30913:
[docker] install cuda13 version of lmcache and nixl
ci/build
kv-connector
nvidia
#31775
opened Jan 6, 2026 by
wangshangsam
Loading…
3 of 5 tasks
[Model] Enable LoRA support for LLaVA family
documentation
Improvements or additions to documentation
#31772
opened Jan 6, 2026 by
ppppqp
Loading…
5 tasks done
[perf] Fused operator SplitMrope used in the Qwen2.5-Omni-7B model
qwen
Related to Qwen models
#31763
opened Jan 6, 2026 by
fuzhihong699
•
Draft
5 tasks
[Bugfix]Add fallback mechanism when XPU kernel does not support FP32 precision FLASH_ATTN in UT
ci/build
v1
#31762
opened Jan 6, 2026 by
1643661061leo
Loading…
5 tasks
[Frontend] Add MCP tool streaming support to Responses API
frontend
gpt-oss
Related to GPT-OSS models
#31761
opened Jan 6, 2026 by
daniel-salib
Loading…
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.