fix: Support for msg[content] as a list #4485

KrishnanPrash · 2025-11-19T23:49:06Z

Overview:

Copy of #4380. Reopened new PR for ease of review + merge.

Chat templates have conflicting expectations for message content format:

Standard templates expect strings: "Hello"
Multimodal templates (llava) expect arrays: [{"type": "text", "text": "Hello"}]

When the wrong format is provided, content goes missing or renders as malformed in prompts.

Detection at model load: Test-render the template with both formats to detect requirements
Normalization per request: Convert between formats based on template needs
- Standard templates: Flatten text-only arrays → strings ("t1\nt2")
- Multimodal templates: Wrap strings → arrays ([{"type": "text", "text": "..."}])
Smart preservation: Mixed content (text + images) always kept as-is

Details:

Added detect_content_array_usage() in formatters.rs
Added requires_content_arrays field to HfTokenizerConfigJsonFormatter
Made may_be_fix_msg_content() bidirectional with preserve_arrays parameter
Updated render pipeline to apply normalization automatically

Related PRs:

Summary by CodeRabbit

New Features
- Added support for LLaVA 1.5 7B multimodal model, enabling unified processing of image and text inputs in prompts.
- Enhanced message content handling to properly support both standard text and multimodal input configurations.
Tests
- Added test coverage for the new multimodal model functionality with image and text input scenarios.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com>

coderabbitai · 2025-11-19T23:50:51Z

Walkthrough

Changes add support for a new multimodal LLaVA model by implementing content array format detection, introducing format-aware message transformation logic, and extending model configuration. The detection mechanism identifies whether templates require array-based content at initialization time, then uses this information to steer message formatting decisions.

Changes

Cohort / File(s)	Summary
Model launch configuration `examples/backends/vllm/launch/agg_multimodal.sh`	Adds conditional branch for llava-hf/llava-1.5-7b-hf model with GPU memory and max model length constraints
Content array state management `lib/llm/src/preprocessor/prompt/template.rs`, `lib/llm/src/preprocessor/prompt/template/formatters.rs`	Adds boolean field `requires_content_arrays` to struct; introduces runtime detection via `detect_content_array_usage()` helper that renders templates with test payloads to determine content format requirements
Message content transformation `lib/llm/src/preprocessor/prompt/template/oai.rs`	Adds `preserve_arrays` parameter to `may_be_fix_msg_content()`, enabling context-aware conversion between string and array content formats; expands logic to handle three cases: string-to-array, array-to-string, and mixed-content preservation
Test configuration `tests/serve/test_vllm.py`	Adds new test entry `multimodal_agg_llava` to vllm_configs with multimodal image-URL request payload and validation for expected response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Template rendering detection logic (formatters.rs): Verify that minijinja context creation and test payload matching correctly identify content format requirements across template variations
Parameter propagation (template.rs, formatters.rs, oai.rs): Confirm that the requires_content_arrays flag flows correctly from initialization through to usage in message formatting decisions
Message transformation cases (oai.rs): Review all three cases in may_be_fix_msg_content() for correctness, particularly edge cases with mixed content types and empty arrays

Poem

🐰 A hare hops through arrays of content,
Detecting formats with minijinja's sent,
LLaVA joins with vision so bright,
From strings to arrays—transformation done right! ✨

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately captures the main change: adding support for message content as a list, which is the core fix addressing template format inconsistencies.
Description check	✅ Passed	The description includes all required template sections: Overview clearly explains the problem, Details covers implementation changes, Related PRs reference associated work, but lacks explicit 'Where should the reviewer start' section.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

lib/llm/src/preprocessor/prompt/template/formatters.rs (2)

9-39: Content-format detection logic looks sound but may under-detect in exotic templates

The detect_content_array_usage probe is straightforward and safe (errors → empty string), and it correctly distinguishes templates that break on string content. However, it only declares requires_content_arrays = true when the array probe surfaces "template_test" and the string probe does not. Templates that never surface user content (or only use it for control flow) will silently fall back to false even if they require arrays, which may cause subtle mismatches.

If you expect such edge-case templates, consider:

Allowing an explicit override in ChatTemplate / MDC, or

Making the probe more flexible (e.g., checking for successful render vs. substring match, or allowing a configurable sentinel key).

152-161: Detection currently ignores tool-only templates

detect_content_array_usage(&env) always probes "default", but in the map case it’s possible to have only "tool_use" registered. In that situation, get_template("default") fails and requires_content_arrays is forced to false, even if the actual tool template expects arrays, so may_be_fix_msg_content will convert text-only arrays back to strings.

If you intend to support tool-only templates, consider:

Probing "tool_use" when "default" is absent, or

Probing all registered templates and OR-ing their results.

lib/llm/src/preprocessor/prompt/template.rs (1)

104-110: Struct extension is consistent with new behavior

Adding requires_content_arrays: bool here aligns with the new detection and render logic and doesn’t introduce construction hazards given the single new() path.

You might consider adding a brief doc comment on the field (e.g., “true if the underlying chat template only supports array-form content”) to avoid future confusion when other formatters are added.

lib/llm/src/preprocessor/prompt/template/oai.rs (1)

172-175: Message flow centralization is good; minor opportunity to avoid double serialization

Switching NvCreateChatCompletionRequest::messages() to just serialize self.inner.messages and then normalizing in OAIPromptFormatter::render() centralizes content handling and keeps the trait implementation simple.

In render, you currently go:

req.messages() → Value

serde_json::to_value(...) → serde_json::Value

may_be_fix_msg_content(...) → Value

serde_json::to_value(...) → serde_json::Value

You could slightly simplify and reduce conversions by having may_be_fix_msg_content operate directly on serde_json::Value and return that, or by calling it before converting into a Value in the first place. Not urgent, but it would shave some overhead on a hot path.

Also applies to: 287-310

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3f764e and 3181edb.

📒 Files selected for processing (5)

examples/backends/vllm/launch/agg_multimodal.sh (1 hunks)
lib/llm/src/preprocessor/prompt/template.rs (1 hunks)
lib/llm/src/preprocessor/prompt/template/formatters.rs (2 hunks)
lib/llm/src/preprocessor/prompt/template/oai.rs (15 hunks)
tests/serve/test_vllm.py (1 hunks)

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.

📚 Learning: 2025-09-16T19:47:30.312Z

Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.

Applied to files:

lib/llm/src/preprocessor/prompt/template/formatters.rs
lib/llm/src/preprocessor/prompt/template/oai.rs
lib/llm/src/preprocessor/prompt/template.rs

📚 Learning: 2025-09-22T18:09:23.513Z

Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3165
File: components/backends/sglang/src/dynamo/sglang/args.py:201-202
Timestamp: 2025-09-22T18:09:23.513Z
Learning: KrishnanPrash suggested adding early validation for custom Jinja template paths in the Rust layer (lib/bindings/python/rust/lib.rs) to benefit both vLLM and SGLang workflows, using PathBuf::from() and path.exists() checks with appropriate PyFileNotFoundError handling.

Applied to files:

lib/llm/src/preprocessor/prompt/template/formatters.rs

📚 Learning: 2025-09-10T22:32:12.978Z

Learnt from: zhongdaor-nv
Repo: ai-dynamo/dynamo PR: 2999
File: lib/parsers/src/tool_calling/harmony/harmony_parser.rs:250-256
Timestamp: 2025-09-10T22:32:12.978Z
Learning: In lib/parsers/src/tool_calling/harmony/harmony_parser.rs, the team prefers to maintain identical code patterns between parse_tool_calls_harmony and parse_tool_calls_harmony_complete functions, including message.content[0] indexing, to ensure consistency between streaming and complete parser implementations.

Applied to files:

lib/llm/src/preprocessor/prompt/template/oai.rs

🧬 Code graph analysis (3)

lib/llm/src/preprocessor/prompt/template/formatters.rs (2)

lib/llm/src/preprocessor/prompt/template/oai.rs (3)

messages (172-175)

messages (223-234)

supports_add_generation_prompt (283-285)

lib/llm/src/preprocessor/prompt.rs (3)

messages (51-51)

supports_add_generation_prompt (82-82)

supports_add_generation_prompt (95-97)

lib/llm/src/preprocessor/prompt/template/oai.rs (1)

lib/llm/src/preprocessor/prompt.rs (1)

messages (51-51)

tests/serve/test_vllm.py (1)

tests/utils/payload_builder.py (2)

chat_payload (129-156)

chat_payload_default (18-43)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)

GitHub Check: sglang (amd64)
GitHub Check: sglang (arm64)
GitHub Check: trtllm (amd64)
GitHub Check: trtllm (arm64)
GitHub Check: vllm (arm64)
GitHub Check: vllm (amd64)
GitHub Check: operator (amd64)
GitHub Check: operator (arm64)
GitHub Check: Mirror Repository to GitLab
GitHub Check: Build and Test - dynamo
GitHub Check: tests (lib/runtime/examples)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: tests (.)
GitHub Check: clippy (lib/bindings/python)
GitHub Check: tests (lib/bindings/python)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: clippy (.)

🔇 Additional comments (4)

examples/backends/vllm/launch/agg_multimodal.sh (1)

45-51: New LLaVA branch is consistent with existing Qwen config

The added elif for llava-hf/llava-1.5-7b-hf mirrors the Qwen settings and cleanly plugs into the existing EXTRA_ARGS logic. Looks good; you can later tune --max-model-len or GPU utilization independently if needed.

tests/serve/test_vllm.py (1)

220-251: LLaVA multimodal config mirrors Qwen coverage and exercises new path

The multimodal_agg_llava entry cleanly follows the existing Qwen aggregated setup and, importantly, adds:

An image_url payload to validate multimodal behavior, and

A chat_payload_default to exercise the string → array normalization.

This looks like solid end-to-end coverage for the new logic.

lib/llm/src/preprocessor/prompt/template/oai.rs (2)

76-135: Bidirectional msg.content normalization is correct and preserves multimodal safety

The updated may_be_fix_msg_content cleanly separates the two modes:

preserve_arrays = true: strings are wrapped into a single {"type":"text","text":...} element, leaving existing arrays untouched.

preserve_arrays = false: non-empty, text-only arrays are concatenated with \n, while mixed / non-text / empty arrays are preserved.

This matches the desired behavior: standard templates get simple strings, while multimodal templates can rely on array form without losing mixed content. Given the prior guarantee that production traffic here is text-only arrays, the extra guards around mixed and non-text types add nice future safety without regressions. Based on learnings.

450-478: Test coverage around content normalization and multimodal/tool interaction is comprehensive

The new and updated tests exercise:

Array → string behavior for single and multiple messages, including system/assistant roles and empty arrays.

Preservation of mixed and non-text-only arrays (image/video/audio) in both standard and multimodal-like scenarios.

Interaction with tool-call argument normalization in the presence of multimodal content.

String → array conversion and array preservation when preserve_arrays=true.

This suite gives strong confidence that the new normalization rules hold across both typical text-only and richer multimodal-shaped payloads, and that tool handling remains correct.

Also applies to: 482-541, 545-565, 569-591, 595-653, 720-770, 875-921, 923-978

KrishnanPrash · 2025-11-20T09:21:37Z

This PR adds necessary support for multimodal models (like llava-hf/llava-1.5-7b-hf) but in order for full support, we need to address #4501.

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

tests/serve/test_vllm.py

rmccorm4 · 2025-11-20T19:08:21Z

LGTM, but needs rust checks fixed

Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com>

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com> Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

fix: multimodal template support + ci coverage for llava model

4055fbb

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

KrishnanPrash requested review from a team as code owners November 19, 2025 23:49

pull-request-size bot added the size/L label Nov 19, 2025

github-actions bot added the fix label Nov 19, 2025

Merge branch 'main' into kprashanth/msg_content_list

3181edb

Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com>

copy-pr-bot bot temporarily deployed to GITLAB November 19, 2025 23:50 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 19, 2025 23:54 Inactive

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

KrishnanPrash requested review from keivenchang, krishung5, nv-nmailhot and rmccorm4 November 20, 2025 00:20

Adding allow fail to llava test

6c3a9a0

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB November 20, 2025 19:04 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 20, 2025 19:05 Inactive

rmccorm4 approved these changes Nov 20, 2025

View reviewed changes

rmccorm4 reviewed Nov 20, 2025

View reviewed changes

tests/serve/test_vllm.py Outdated Show resolved Hide resolved

Update tests/serve/test_vllm.py

10d49a2

Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com>

copy-pr-bot bot temporarily deployed to GITLAB November 20, 2025 19:11 Inactive

keivenchang approved these changes Nov 20, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB November 20, 2025 19:16 Inactive

krishung5 approved these changes Nov 20, 2025

View reviewed changes

formatting

8f7127b

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB November 20, 2025 22:39 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 20, 2025 22:43 Inactive

nv-nmailhot approved these changes Nov 20, 2025

View reviewed changes

KrishnanPrash enabled auto-merge (squash) November 20, 2025 23:20

KrishnanPrash merged commit 6767559 into main Nov 20, 2025
33 of 35 checks passed

KrishnanPrash deleted the kprashanth/msg_content_list branch November 20, 2025 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Support for msg[content] as a list #4485

fix: Support for msg[content] as a list #4485

Uh oh!

KrishnanPrash commented Nov 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 19, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

KrishnanPrash commented Nov 20, 2025

Uh oh!

Uh oh!

rmccorm4 commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fix: Support for msg[content] as a list #4485

fix: Support for msg[content] as a list #4485

Uh oh!

Conversation

KrishnanPrash commented Nov 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Related PRs:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

KrishnanPrash commented Nov 20, 2025

Uh oh!

Uh oh!

rmccorm4 commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

KrishnanPrash commented Nov 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 19, 2025 •

edited

Loading