Skip to content

Conversation

@hmellor
Copy link
Member

@hmellor hmellor commented Oct 29, 2025

  • Don't call basic test files individually
  • Splits the long entrypoints test group into 2 shards
  • Move the unit tests into a unit directory so we don't have to manually ignore everything else
  • Move openai/tool_parsers into the unit tests because that's what they are

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@mergify mergify bot added ci/build llama Related to Llama models labels Oct 29, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request focuses on reorganizing the test structure for better maintainability and efficiency. It involves moving unit tests into a dedicated directory, splitting a large test group into shards, and updating test pipeline configurations. The review identifies a critical issue related to potential test failures due to missing multi-GPU configurations.

Comment on lines +11 to -17


@pytest.mark.parametrize("tp_size", [1, 2])
@pytest.mark.parametrize("backend", ["mp", "ray"])
@create_new_process_for_each_test()
def test_collective_rpc(tp_size, backend, monkeypatch):
if torch.cuda.device_count() < tp_size:
pytest.skip(f"Not enough GPUs for tensor parallelism {tp_size}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The test_collective_rpc test is marked as a multi-GPU test, but the pytestmark is defined after the imports. This could lead to tests being run without the multi-gpu setup if pytest runs before the mark is applied. It's better to define pytestmark at the top of the file to ensure it's always applied. This is a critical issue as it can lead to tests passing incorrectly when they should be failing due to insufficient GPU resources.

pytestmark = pytest.mark.multi_gpu_test(num_gpus=2)


from ...utils import create_new_process_for_each_test


@pytest.mark.parametrize("tp_size", [1, 2])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do this anywhere else that we use pytestmark

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 4 to 18
import pytest
import torch

from vllm import LLM

from ...utils import create_new_process_for_each_test

pytestmark = pytest.mark.multi_gpu_test(num_gpus=2)


@pytest.mark.parametrize("tp_size", [1, 2])
@pytest.mark.parametrize("backend", ["mp", "ray"])
@create_new_process_for_each_test()
def test_collective_rpc(tp_size, backend, monkeypatch):
if torch.cuda.device_count() < tp_size:
pytest.skip(f"Not enough GPUs for tensor parallelism {tp_size}")
if tp_size == 1 and backend == "ray":
pytest.skip("Skip duplicate test case")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Removing device-count guard lets multi-GPU test run on single GPU

The test previously skipped when torch.cuda.device_count() < tp_size, but the new module-level pytestmark = pytest.mark.multi_gpu_test(num_gpus=2) does not perform that check—the multi_gpu_test skip logic lives in the decorator in tests/utils.py, not in a pytest marker. As a result, on environments with only one GPU this test will now run and attempt to create an LLM with tensor_parallel_size=2, causing a failure instead of a skip. To restore the behaviour, either keep the explicit device-count guard or apply the multi_gpu_test decorator (or its marks) so the skip happens before the LLM is constructed.

Useful? React with 👍 / 👎.

@bbrowning
Copy link
Contributor

  • Move openai/tool_parsers into the unit tests because that's what they are

This is mostly correct, except test_hermes_tool_parser.py that actually spins up vLLM in some of its tests. Those tests may need to get split out from the rest here, and generally tool parser tests that spin up a server are under tests/tool_use instead. The hermes test file has some of both, so it could either be moved entirely for now or split into two files in the appropriate place.

@hmellor
Copy link
Member Author

hmellor commented Oct 29, 2025

Thanks for the additional context. My reasoning for moving them there was that they were run in the Entrypoints Unit Tests job.

I'll move the ones that start the server into tests/tool_use

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build llama Related to Llama models tool-calling

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants