Skip to content

Commit 08a3395

Browse files
wangln19wangln19hmellor
authored
Update for more accurate words (#110)
Signed-off-by: wangln19 <wanglinian@stu.pku.edu.cn> Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local> Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
1 parent cba0fea commit 08a3395

File tree

1 file changed

+7
-6
lines changed

1 file changed

+7
-6
lines changed

_posts/2025-10-28-Kimi-K2-Accuracy.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,8 @@ To isolate the problem, I devised a crucial experiment. Instead of using vLLM's
4747
A deeper look revealed that the Kimi tokenizer's `apply_chat_template` function signature includes `**kwargs` to accept extra, model-specific parameters. One such parameter, `add_generation_prompt=True`, is essential for correctly formatting the prompt to signal the start of the assistant's turn, guiding it towards generating a tool call.
4848

4949
A correct prompt should end with special tokens that prime the model to act as the assistant:
50-
```json
50+
51+
```
5152
Correct Prompt Suffix: ...<|im_assistant|>assistant<|im_middle|>
5253
```
5354
However, because vLLM was not passing `add_generation_prompt=True`, the prompt was truncated right after the user's message.
@@ -72,17 +73,17 @@ Kimi's Jinja-based chat template was designed to render a string `content`. When
7273

7374
**Incorrect Prompt Snippet:**
7475

75-
```json
76+
```
7677
...<|im_end|><|im_assistant|>assistant<|im_middle|>[{'type': 'text', 'text': ''}]<|tool_calls_section_begin|>...
7778
```
7879

7980
**Correct Prompt Snippet:**
8081

81-
```json
82+
```
8283
...<|im_end|><|im_assistant|>assistant<|im_middle|><|tool_calls_section_begin|>...
8384
```
8485

85-
This seemingly small error was enough to confuse the model's generation logic.
86+
This critical formatting error created a malformed prompt that was enough to confuse the model's generation logic.
8687

8788
**The Fix:**
8889

@@ -94,9 +95,9 @@ Finally, I noticed that even when the model generated a syntactically correct to
9495

9596
**The Investigation:**
9697

97-
By inspecting the raw `text_completion` output from vLLM, the culprit became obvious. The model would occasionally generate tool-call IDs that didn't strictly conform to Kimi's official specification. For instance, consider this output:
98+
By inspecting the raw `text_completion` output from vLLM, the culprit became obvious. I found that in certain edge cases, particularly when misled by a malformed conversation history, the model would generate tool-call IDs that didn't strictly conform to Kimi's official specification. For instance, consider this output:
9899

99-
```json
100+
```
100101
...<|tool_calls_section_begin|><|tool_call_begin|>search:2<|tool_call_argument_begin|>...
101102
```
102103

0 commit comments

Comments
 (0)