Skip to content

Conversation

@MrIceCreamMan
Copy link
Contributor

@MrIceCreamMan MrIceCreamMan commented Jan 6, 2026

Purpose

Fix the issue #31501

What is wrong

Summary

When the stream-interval is big enough to contain {full_tool_call}<tool_end_token> in one delta_text, the parser will fail with Tool: None, Args: ''
When the stream-interval contain {part_of_tool_call}<tool_end_token> in one delta_text, the parse will with Tool: list_directory, Args: '{}'

Detailed tracing

In the test python script, the tool call had a total of 20 tokens (including start and end token).
Let's see what happens when interval is set to 20, counters in the log stands for these four

            prev_tool_start_count = previous_text.count(self.tool_call_start_token)
            prev_tool_end_count = previous_text.count(self.tool_call_end_token)
            cur_tool_start_count = current_text.count(self.tool_call_start_token)
            cur_tool_end_count = current_text.count(self.tool_call_end_token)

and here is the log

This parser is called
INFO 01-04 21:42:33 [hermes_tool_parser.py:200] delta_text: <tool_call>
INFO 01-04 21:42:33 [hermes_tool_parser.py:201] delta_token_ids: [151657]
INFO 01-04 21:42:33 [hermes_tool_parser.py:216] counters: 0, 0, 1, 0
INFO 01-04 21:42:33 [hermes_tool_parser.py:264] Starting on a new tool 0
INFO 01-04 21:42:33 [hermes_tool_parser.py:327] Parsed tool call None

This parser is called
INFO 01-04 21:42:34 [hermes_tool_parser.py:200] delta_text: 
INFO 01-04 21:42:34 [hermes_tool_parser.py:200] {"name": "list_directory", "arguments": {"dir": "/src"}}
INFO 01-04 21:42:34 [hermes_tool_parser.py:200] </tool_call>
INFO 01-04 21:42:34 [hermes_tool_parser.py:201] delta_token_ids: [198, 4913, 606, 788, 330, 1607, 14846, 497, 330, 16370, 788, 5212, 3741, 788, 3521, 3548, 95642, 151658, 151645]
INFO 01-04 21:42:34 [hermes_tool_parser.py:216] counters: 1, 0, 1, 1
INFO 01-04 21:42:34 [hermes_tool_parser.py:228] tool_call_end_token in delta_text
INFO 01-04 21:42:34 [hermes_tool_parser.py:282] attempting to close tool call, but no tool call

There is a total of 2 iterations, in the 1st one, the delta_text is just a <tool_call>. In the 2nd one, it is actually \n{"name": "list_directory", "arguments": {"dir": "/src"}}\n</tool_call> The \n character makes the content of delta_text appear on 3 lines but actually they are in the same delta_text

Because in the 1st iteration, we never updated self.prev_tool_call_arr, so on the 2nd iteration, we saw the message "attempting to close tool call, but no tool call" This is an easy fix, that we just need to append an empty {} to it whenever we create a new tool call. This will help us pass the first null check in # case -- the current tool call is being closed. But remember our interval value (20) can contain all 19 tokens, which means we are getting the tool call content at the same time as the </tool_call> token. Therefore, self.prev_tool_call_arr was not yet updated, and diff would be empty. Then we fall through to

current_tool_call = (
    partial_json_parser.loads(tool_call_portion or "{}", flags)
    if tool_call_portion
    else None
)

Luckily, when there is </tool_call> in the delta_text, tool_call_portion will always give good results (you can verify it yourself). Because we are on only the 2nd iteration where the 1st one only starts the call, we have not set the function name yet. So we enter if not self.current_tool_name_sent: Here is another oversight: we didn't expect function name and </tool_call> can be inside one delta_text, so we never actually handle the tool call arguments in this section. Easy fix right? Just add them and update the states properly! But what about partial arguments? Luckily DeltaFunctionCall handles partial arguments pretty well and we don't have to worry about them here.

So far, we have talked about the case for delta_text contains {full_tool_call}<tool_end_token>. Now we need to consider the case for {partial_tool_call}<tool_end_token>.

Here is the new log

This parser is called
INFO 01-06 00:34:21 [hermes_tool_parser.py:200] delta_text: <tool_call>
INFO 01-06 00:34:21 [hermes_tool_parser.py:201] delta_token_ids: [151657]
INFO 01-06 00:34:21 [hermes_tool_parser.py:214] counters: 0 0 1 0
INFO 01-06 00:34:21 [hermes_tool_parser.py:265] Starting on a new tool 0
INFO 01-06 00:34:21 [hermes_tool_parser.py:325] Parsed tool call None
This parser is called
INFO 01-06 00:34:21 [hermes_tool_parser.py:200] delta_text: 
INFO 01-06 00:34:21 [hermes_tool_parser.py:200] {"name": "list_directory", "arguments
INFO 01-06 00:34:21 [hermes_tool_parser.py:201] delta_token_ids: [198, 4913, 606, 788, 330, 1607, 14846, 497, 330, 16370]
INFO 01-06 00:34:21 [hermes_tool_parser.py:214] counters: 1 0 1 0
INFO 01-06 00:34:21 [hermes_tool_parser.py:325] Parsed tool call {'name': 'list_directory'}
This parser is called
INFO 01-06 00:34:21 [hermes_tool_parser.py:200] delta_text: ": {"dir": "/src"}}
INFO 01-06 00:34:21 [hermes_tool_parser.py:200] </tool_call>
INFO 01-06 00:34:21 [hermes_tool_parser.py:201] delta_token_ids: [788, 5212, 3741, 788, 3521, 3548, 95642, 151658, 151645]
INFO 01-06 00:34:21 [hermes_tool_parser.py:214] counters: 1 0 1 1
INFO 01-06 00:34:21 [hermes_tool_parser.py:229] tool_call_end_token in delta_text
INFO 01-06 00:34:21 [hermes_tool_parser.py:282] attempting to close tool call, but no tool call

At this point, you should be familiar with this message "attempting to close tool call, but no tool call". Briefly: 1st iteration returns None, 2nd iteration returns function name without specifying arguments in if not self.current_tool_name_sent:, and 3rd iteration finds prev_tool_call_arr was never properly initialized. These are all old info we already know. The part I want to point out is suppose we properly initialized prev_tool_call_arr with an empty {}. What happens next?

The 3rd iteration becomes this

INFO 01-06 00:46:31 [hermes_tool_parser.py:200] delta_text: ": {"dir": "/src"}}
INFO 01-06 00:46:31 [hermes_tool_parser.py:200] </tool_call>
INFO 01-06 00:46:31 [hermes_tool_parser.py:201] delta_token_ids: [788, 5212, 3741, 788, 3521, 3548, 95642, 151658, 151645]
INFO 01-06 00:46:31 [hermes_tool_parser.py:214] counters: 1 0 1 1
INFO 01-06 00:46:31 [hermes_tool_parser.py:229] tool_call_end_token in delta_text
INFO 01-06 00:46:31 [hermes_tool_parser.py:326] Parsed tool call {'name': 'list_directory', 'arguments': {'dir': '/src'}}
INFO 01-06 00:46:31 [hermes_tool_parser.py:372] Trying to parse current tool call with ID 0
INFO 01-06 00:46:31 [hermes_tool_parser.py:388] diffing old arguments: None
INFO 01-06 00:46:31 [hermes_tool_parser.py:389] against new ones: {'dir': '/src'}
INFO 01-06 00:46:31 [hermes_tool_parser.py:429] finding ": {"dir": "/src"}} in {"dir": "/src"}}

Notice the part finding ": {"dir": "/src"}} in {"dir": "/src"}}. Because the 2 delta_text are {"name": "list_directory", "arguments and ": {"dir": "/src"}}. The 3rd iteration delta_text actually has structural JSON characters ": which made the check fail and returns None. After some analysis, I think the best way to fix this is to always use cur_arguments_json directly rather than creating arguments_delta. I believe the original idea of having arguments_delta is to avoid issues with json.dumps function giving autocompleted JSON and that's why we want to get the raw JSON from the delta_text. But honestly, we will eventually overwrite incomplete JSON anyways in later DeltaMessage. Autocompleted JSON shouldn't be a problem here. Let me know if you have any other suggestions.

Test Plan

E2E test: ./tests/v1/e2e/test_streaming_tool_calls.py

Manual test:
Running this command with different stream-interval values [1, 8, 9, 10, 18, 19, 20] on my local machine.
This model only requires less than 8GB VRAM. You should be able to run it on your local machine!

vllm serve Qwen/Qwen2.5-1.5B-Instruct   
--stream-interval 18   
--enable-auto-tool-choice   
--tool-call-parser hermes   
--max-model-len 2048

and then run this python script from ktsaou (special thanks)

#!/usr/bin/env python3
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
"""Reproduction script for vLLM issue #31501"""

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

tools = [
    {
        "type": "function",
        "function": {
            "name": "list_directory",
            "description": "List contents of a directory.",
            "parameters": {
                "type": "object",
                "properties": {
                    "dir": {"type": "string", "description": "Directory path"}
                },
                "required": ["dir"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-1.5B-Instruct",
    messages=[
        {
            "role": "system",
            "content": "Use the list_directory tool when asked about files.",
        },
        {"role": "user", "content": "List files in the src folder"},
    ],
    tools=tools,
    stream=True,
)

# Accumulate streamed tool call
accumulated_args = ""
tool_name = None

for chunk in response:
    delta = chunk.choices[0].delta
    if delta.tool_calls:
        for tc in delta.tool_calls:
            if tc.function.name:
                tool_name = tc.function.name
            if tc.function.arguments:
                accumulated_args += tc.function.arguments

print(f"Tool: {tool_name}, Args: {accumulated_args!r}")

Test Result

Before the fix:

Setting Output Status
--stream-interval 20 Tool: None, Args: '' BUG
--stream-interval 1 (default) Tool: list_directory, Args: '{"dir": "src"}' OK

After the fix:

Setting Output Status
--stream-interval 1 (default) Tool: list_directory, Args: '{"dir": "src"}' OK
--stream-interval 8 (default) Tool: list_directory, Args: '{"dir": "src"}' OK
--stream-interval 9 (default) Tool: list_directory, Args: '{"dir": "src"}' OK
--stream-interval 10 (default) Tool: list_directory, Args: '{"dir": "src"}' OK
--stream-interval 18 (default) Tool: list_directory, Args: '{"dir": "src"}' OK
--stream-interval 19 (default) Tool: list_directory, Args: '{"dir": "src"}' OK
--stream-interval 20 (default) Tool: list_directory, Args: '{"dir": "src"}' OK

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Zhuohao Yang <zy242@cornell.edu>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical bug in the hermes_tool_parser that occurs when stream-interval > 1, causing tool call parsing to fail. The changes correctly handle scenarios where a single streaming delta contains multiple parts of a tool call, such as the function name and arguments, or partial arguments and the end token. The fixes include proper state initialization for new tool calls, preventing premature parsing termination, and correcting the logic for handling argument chunks. The changes are well-reasoned and supported by detailed testing, and I find them to be a solid improvement to the parser's robustness.

Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an e2e test?

@chaunceyjiang chaunceyjiang self-assigned this Jan 6, 2026
Signed-off-by: Zhuohao Yang <zy242@cornell.edu>
@mergify mergify bot added the v1 label Jan 7, 2026
@MrIceCreamMan MrIceCreamMan changed the title fixing stream interval > 1 will cause tool call bug [BugFix] fixing stream interval > 1 will cause tool call bug Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants