Skip to content

Conversation

@KrishnanPrash
Copy link
Contributor

@KrishnanPrash KrishnanPrash commented Nov 19, 2025

Overview:

Copy of #4380. Reopened new PR for ease of review + merge.

Chat templates have conflicting expectations for message content format:

  • Standard templates expect strings: "Hello"
  • Multimodal templates (llava) expect arrays: [{"type": "text", "text": "Hello"}]

When the wrong format is provided, content goes missing or renders as malformed in prompts.

  1. Detection at model load: Test-render the template with both formats to detect requirements
  2. Normalization per request: Convert between formats based on template needs
    • Standard templates: Flatten text-only arrays → strings ("t1\nt2")
    • Multimodal templates: Wrap strings → arrays ([{"type": "text", "text": "..."}])
  3. Smart preservation: Mixed content (text + images) always kept as-is

Details:

  • Added detect_content_array_usage() in formatters.rs
  • Added requires_content_arrays field to HfTokenizerConfigJsonFormatter
  • Made may_be_fix_msg_content() bidirectional with preserve_arrays parameter
  • Updated render pipeline to apply normalization automatically

Related PRs:

Summary by CodeRabbit

  • New Features

    • Added support for LLaVA 1.5 7B multimodal model, enabling unified processing of image and text inputs in prompts.
    • Enhanced message content handling to properly support both standard text and multimodal input configurations.
  • Tests

    • Added test coverage for the new multimodal model functionality with image and text input scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
@KrishnanPrash KrishnanPrash requested review from a team as code owners November 19, 2025 23:49
@github-actions github-actions bot added the fix label Nov 19, 2025
Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

Walkthrough

Changes add support for a new multimodal LLaVA model by implementing content array format detection, introducing format-aware message transformation logic, and extending model configuration. The detection mechanism identifies whether templates require array-based content at initialization time, then uses this information to steer message formatting decisions.

Changes

Cohort / File(s) Summary
Model launch configuration
examples/backends/vllm/launch/agg_multimodal.sh
Adds conditional branch for llava-hf/llava-1.5-7b-hf model with GPU memory and max model length constraints
Content array state management
lib/llm/src/preprocessor/prompt/template.rs, lib/llm/src/preprocessor/prompt/template/formatters.rs
Adds boolean field requires_content_arrays to struct; introduces runtime detection via detect_content_array_usage() helper that renders templates with test payloads to determine content format requirements
Message content transformation
lib/llm/src/preprocessor/prompt/template/oai.rs
Adds preserve_arrays parameter to may_be_fix_msg_content(), enabling context-aware conversion between string and array content formats; expands logic to handle three cases: string-to-array, array-to-string, and mixed-content preservation
Test configuration
tests/serve/test_vllm.py
Adds new test entry multimodal_agg_llava to vllm_configs with multimodal image-URL request payload and validation for expected response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Template rendering detection logic (formatters.rs): Verify that minijinja context creation and test payload matching correctly identify content format requirements across template variations
  • Parameter propagation (template.rs, formatters.rs, oai.rs): Confirm that the requires_content_arrays flag flows correctly from initialization through to usage in message formatting decisions
  • Message transformation cases (oai.rs): Review all three cases in may_be_fix_msg_content() for correctness, particularly edge cases with mixed content types and empty arrays

Poem

🐰 A hare hops through arrays of content,
Detecting formats with minijinja's sent,
LLaVA joins with vision so bright,
From strings to arrays—transformation done right! ✨

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the main change: adding support for message content as a list, which is the core fix addressing template format inconsistencies.
Description check ✅ Passed The description includes all required template sections: Overview clearly explains the problem, Details covers implementation changes, Related PRs reference associated work, but lacks explicit 'Where should the reviewer start' section.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
lib/llm/src/preprocessor/prompt/template/formatters.rs (2)

9-39: Content-format detection logic looks sound but may under-detect in exotic templates

The detect_content_array_usage probe is straightforward and safe (errors → empty string), and it correctly distinguishes templates that break on string content. However, it only declares requires_content_arrays = true when the array probe surfaces "template_test" and the string probe does not. Templates that never surface user content (or only use it for control flow) will silently fall back to false even if they require arrays, which may cause subtle mismatches.

If you expect such edge-case templates, consider:

  • Allowing an explicit override in ChatTemplate / MDC, or
  • Making the probe more flexible (e.g., checking for successful render vs. substring match, or allowing a configurable sentinel key).

152-161: Detection currently ignores tool-only templates

detect_content_array_usage(&env) always probes "default", but in the map case it’s possible to have only "tool_use" registered. In that situation, get_template("default") fails and requires_content_arrays is forced to false, even if the actual tool template expects arrays, so may_be_fix_msg_content will convert text-only arrays back to strings.

If you intend to support tool-only templates, consider:

  • Probing "tool_use" when "default" is absent, or
  • Probing all registered templates and OR-ing their results.
lib/llm/src/preprocessor/prompt/template.rs (1)

104-110: Struct extension is consistent with new behavior

Adding requires_content_arrays: bool here aligns with the new detection and render logic and doesn’t introduce construction hazards given the single new() path.

You might consider adding a brief doc comment on the field (e.g., “true if the underlying chat template only supports array-form content”) to avoid future confusion when other formatters are added.

lib/llm/src/preprocessor/prompt/template/oai.rs (1)

172-175: Message flow centralization is good; minor opportunity to avoid double serialization

Switching NvCreateChatCompletionRequest::messages() to just serialize self.inner.messages and then normalizing in OAIPromptFormatter::render() centralizes content handling and keeps the trait implementation simple.

In render, you currently go:

  1. req.messages()Value
  2. serde_json::to_value(...)serde_json::Value
  3. may_be_fix_msg_content(...)Value
  4. serde_json::to_value(...)serde_json::Value

You could slightly simplify and reduce conversions by having may_be_fix_msg_content operate directly on serde_json::Value and return that, or by calling it before converting into a Value in the first place. Not urgent, but it would shave some overhead on a hot path.

Also applies to: 287-310

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3f764e and 3181edb.

📒 Files selected for processing (5)
  • examples/backends/vllm/launch/agg_multimodal.sh (1 hunks)
  • lib/llm/src/preprocessor/prompt/template.rs (1 hunks)
  • lib/llm/src/preprocessor/prompt/template/formatters.rs (2 hunks)
  • lib/llm/src/preprocessor/prompt/template/oai.rs (15 hunks)
  • tests/serve/test_vllm.py (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.

Applied to files:

  • lib/llm/src/preprocessor/prompt/template/formatters.rs
  • lib/llm/src/preprocessor/prompt/template/oai.rs
  • lib/llm/src/preprocessor/prompt/template.rs
📚 Learning: 2025-09-22T18:09:23.513Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3165
File: components/backends/sglang/src/dynamo/sglang/args.py:201-202
Timestamp: 2025-09-22T18:09:23.513Z
Learning: KrishnanPrash suggested adding early validation for custom Jinja template paths in the Rust layer (lib/bindings/python/rust/lib.rs) to benefit both vLLM and SGLang workflows, using PathBuf::from() and path.exists() checks with appropriate PyFileNotFoundError handling.

Applied to files:

  • lib/llm/src/preprocessor/prompt/template/formatters.rs
📚 Learning: 2025-09-10T22:32:12.978Z
Learnt from: zhongdaor-nv
Repo: ai-dynamo/dynamo PR: 2999
File: lib/parsers/src/tool_calling/harmony/harmony_parser.rs:250-256
Timestamp: 2025-09-10T22:32:12.978Z
Learning: In lib/parsers/src/tool_calling/harmony/harmony_parser.rs, the team prefers to maintain identical code patterns between parse_tool_calls_harmony and parse_tool_calls_harmony_complete functions, including message.content[0] indexing, to ensure consistency between streaming and complete parser implementations.

Applied to files:

  • lib/llm/src/preprocessor/prompt/template/oai.rs
🧬 Code graph analysis (3)
lib/llm/src/preprocessor/prompt/template/formatters.rs (2)
lib/llm/src/preprocessor/prompt/template/oai.rs (3)
  • messages (172-175)
  • messages (223-234)
  • supports_add_generation_prompt (283-285)
lib/llm/src/preprocessor/prompt.rs (3)
  • messages (51-51)
  • supports_add_generation_prompt (82-82)
  • supports_add_generation_prompt (95-97)
lib/llm/src/preprocessor/prompt/template/oai.rs (1)
lib/llm/src/preprocessor/prompt.rs (1)
  • messages (51-51)
tests/serve/test_vllm.py (1)
tests/utils/payload_builder.py (2)
  • chat_payload (129-156)
  • chat_payload_default (18-43)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: sglang (amd64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: operator (arm64)
  • GitHub Check: Mirror Repository to GitLab
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: tests (lib/runtime/examples)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: tests (.)
  • GitHub Check: clippy (lib/bindings/python)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: clippy (.)
🔇 Additional comments (4)
examples/backends/vllm/launch/agg_multimodal.sh (1)

45-51: New LLaVA branch is consistent with existing Qwen config

The added elif for llava-hf/llava-1.5-7b-hf mirrors the Qwen settings and cleanly plugs into the existing EXTRA_ARGS logic. Looks good; you can later tune --max-model-len or GPU utilization independently if needed.

tests/serve/test_vllm.py (1)

220-251: LLaVA multimodal config mirrors Qwen coverage and exercises new path

The multimodal_agg_llava entry cleanly follows the existing Qwen aggregated setup and, importantly, adds:

  • An image_url payload to validate multimodal behavior, and
  • A chat_payload_default to exercise the string → array normalization.

This looks like solid end-to-end coverage for the new logic.

lib/llm/src/preprocessor/prompt/template/oai.rs (2)

76-135: Bidirectional msg.content normalization is correct and preserves multimodal safety

The updated may_be_fix_msg_content cleanly separates the two modes:

  • preserve_arrays = true: strings are wrapped into a single {"type":"text","text":...} element, leaving existing arrays untouched.
  • preserve_arrays = false: non-empty, text-only arrays are concatenated with \n, while mixed / non-text / empty arrays are preserved.

This matches the desired behavior: standard templates get simple strings, while multimodal templates can rely on array form without losing mixed content. Given the prior guarantee that production traffic here is text-only arrays, the extra guards around mixed and non-text types add nice future safety without regressions. Based on learnings.


450-478: Test coverage around content normalization and multimodal/tool interaction is comprehensive

The new and updated tests exercise:

  • Array → string behavior for single and multiple messages, including system/assistant roles and empty arrays.
  • Preservation of mixed and non-text-only arrays (image/video/audio) in both standard and multimodal-like scenarios.
  • Interaction with tool-call argument normalization in the presence of multimodal content.
  • String → array conversion and array preservation when preserve_arrays=true.

This suite gives strong confidence that the new normalization rules hold across both typical text-only and richer multimodal-shaped payloads, and that tool handling remains correct.

Also applies to: 482-541, 545-565, 569-591, 595-653, 720-770, 875-921, 923-978

@KrishnanPrash
Copy link
Contributor Author

This PR adds necessary support for multimodal models (like llava-hf/llava-1.5-7b-hf) but in order for full support, we need to address #4501.

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
@rmccorm4
Copy link
Contributor

LGTM, but needs rust checks fixed

Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com>
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
@KrishnanPrash KrishnanPrash enabled auto-merge (squash) November 20, 2025 23:20
@KrishnanPrash KrishnanPrash merged commit 6767559 into main Nov 20, 2025
33 of 35 checks passed
@KrishnanPrash KrishnanPrash deleted the kprashanth/msg_content_list branch November 20, 2025 23:38
zxue2 pushed a commit to zxue2/dynamo that referenced this pull request Nov 22, 2025
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants