fix(multimodal): fall back to config model type for aliased vision models#755
fix(multimodal): fall back to config model type for aliased vision models#755YouNeedCryDear wants to merge 3 commits intolightseekorg:mainfrom
Conversation
📝 WalkthroughWalkthroughReplace substring-based model family detection with model_type-hint matching: ModelMetadata gains helpers; vision specs use them instead of inspecting model_id; ImageProcessorRegistry.find accepts an optional model_type fallback; model_gateway passes config.model_type into the lookup. (47 words) Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Gateway as ModelGateway
participant Registry as ImageProcessorRegistry
participant Metadata as ModelMetadata
Client->>Gateway: request with model_id
Gateway->>Metadata: load model config -> config.model_type
Gateway->>Registry: find(model_id, Some(model_type))
Registry->>Registry: find_in_candidate(model_id) (fast path)
alt found by model_id
Registry-->>Gateway: processor
else not found
Registry->>Registry: iterate processors by pattern using model_type
Registry->>Metadata: matches_model_type_hints(hints)
alt matches via model_type
Registry-->>Gateway: processor
else no match
Registry-->>Gateway: None
end
end
Gateway-->>Client: proceed or error
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where multimodal family detection and image processor lookup failed when vision models were exposed under aliased names, as the system previously relied solely on the request-facing model ID. The solution introduces a robust fallback mechanism that utilizes the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Hi @YouNeedCryDear, the DCO sign-off check has failed. All commits must include a To fix existing commits: # Sign off the last N commits (replace N with the number of unsigned commits)
git rebase HEAD~N --signoff
git push --force-with-leaseTo sign off future commits automatically:
|
There was a problem hiding this comment.
Code Review
This pull request correctly implements a fallback mechanism to use the model_type from the configuration when the model_id does not match a known multimodal model family. This is achieved by introducing a new helper matches_model_type_hints and updating the matches methods in the model processor specs. The image processor lookup is also updated to use this fallback. The changes are well-tested with new regression tests. However, there appears to be a significant accidental deletion of code in model_gateway/src/routers/grpc/multimodal.rs, which is a critical issue that needs to be addressed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ddf2bb065d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
model_gateway/src/routers/grpc/multimodal.rs (1)
376-384:⚠️ Potential issue | 🔴 CriticalThis PR has critical compilation errors that must be fixed before merging.
The code cannot compile. Two blocking issues:
The
expand_tokensfunction inmultimodal.rs(line 370–383) is incomplete—it ends with an unclosed brace and missing function body closure.The
assemble_multimodal_datafunction is called inrequest_building.rs(line 81, imported line 13) but does not exist inmultimodal.rsor anywhere else in the codebase. This will produce an undefined reference error.Confirm that the intended final state includes:
- A complete, properly closed
expand_tokensfunction implementation- Either restoration of
assemble_multimodal_dataor removal of its call site inrequest_building.rs🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/multimodal.rs` around lines 376 - 384, The expand_tokens function is unterminated and missing its final return/closing brace: finish its implementation in multimodal.rs (complete the logic after resolving placeholder_token_id and ensure it returns an ExpandedTokens struct with token_ids, placeholders, and patch_offsets) and add the missing closing brace to properly end the function; additionally resolve the undefined assemble_multimodal_data symbol by either restoring/implementing assemble_multimodal_data in multimodal.rs with the same public signature expected by request_building.rs (so request_building.rs can call it), or remove/replace the call in request_building.rs (where assemble_multimodal_data is imported and invoked) to use the new expand_tokens API — make sure to reference the expand_tokens function and assemble_multimodal_data symbol when editing so compilation errors are eliminated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@model_gateway/src/routers/grpc/multimodal.rs`:
- Around line 376-384: The expand_tokens function is unterminated and missing
its final return/closing brace: finish its implementation in multimodal.rs
(complete the logic after resolving placeholder_token_id and ensure it returns
an ExpandedTokens struct with token_ids, placeholders, and patch_offsets) and
add the missing closing brace to properly end the function; additionally resolve
the undefined assemble_multimodal_data symbol by either restoring/implementing
assemble_multimodal_data in multimodal.rs with the same public signature
expected by request_building.rs (so request_building.rs can call it), or
remove/replace the call in request_building.rs (where assemble_multimodal_data
is imported and invoked) to use the new expand_tokens API — make sure to
reference the expand_tokens function and assemble_multimodal_data symbol when
editing so compilation errors are eliminated.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 50cdc872-a864-45a4-b66c-b3e3ebff2d42
📒 Files selected for processing (8)
crates/multimodal/src/registry/llama4.rscrates/multimodal/src/registry/llava.rscrates/multimodal/src/registry/phi3_v.rscrates/multimodal/src/registry/qwen3_vl.rscrates/multimodal/src/registry/qwen_vl.rscrates/multimodal/src/registry/traits.rscrates/multimodal/src/vision/image_processor.rsmodel_gateway/src/routers/grpc/multimodal.rs
ddf2bb0 to
2816ea7
Compare
Signed-off-by: Arthur Cheng <arthur.cheng@oracle.com>
Signed-off-by: Arthur Cheng <arthur.cheng@oracle.com>
2816ea7 to
268c1fe
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 268c1fed01
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/multimodal/src/vision/image_processor.rs`:
- Around line 358-364: The current find_in_candidate method returns the first
substring match from self.processors (a HashMap), which is non-deterministic and
can pick shorter/overlapping patterns like "llava" before "llava-next"; change
it to collect all processors whose pattern (case-insensitive) is contained in
the candidate, then choose the most specific match deterministically by sorting
matches by descending pattern length and then by a stable tie-breaker (e.g.,
lexicographic order) and returning the first entry. Update find_in_candidate to
(1) lowercase candidate once, (2) iterate through self.processors to collect
matching (pattern, processor) pairs, (3) sort the collected matches by
(-pattern.len(), pattern) and (4) return the processor reference of the top
match or None if empty, ensuring case-insensitive comparisons and deterministic
selection.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: fd498390-1608-4021-a037-50d21ca8a048
📒 Files selected for processing (8)
crates/multimodal/src/registry/llama4.rscrates/multimodal/src/registry/llava.rscrates/multimodal/src/registry/phi3_v.rscrates/multimodal/src/registry/qwen3_vl.rscrates/multimodal/src/registry/qwen_vl.rscrates/multimodal/src/registry/traits.rscrates/multimodal/src/vision/image_processor.rsmodel_gateway/src/routers/grpc/multimodal.rs
| fn find_in_candidate(&self, candidate: &str) -> Option<&dyn ImagePreProcessor> { | ||
| let candidate = candidate.to_ascii_lowercase(); | ||
| for (pattern, processor) in &self.processors { | ||
| if candidate.contains(&pattern.to_ascii_lowercase()) { | ||
| return Some(processor.as_ref()); | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="$(fd -p 'image_processor.rs' | head -n1)"
echo "Inspecting: $FILE"
echo
echo "[1] Confirm HashMap + first-match iteration:"
rg -n -C2 'processors:\s*HashMap|for \(pattern, processor\) in &self\.processors|return Some\(processor\.as_ref\(\)\)' "$FILE"
echo
echo "[2] List registered patterns and detect substring overlaps:"
python - <<'PY'
import re
from pathlib import Path
p = next(Path('.').rglob('image_processor.rs'))
text = p.read_text()
patterns = re.findall(r'registry\.register\(\s*"([^"]+)"', text)
print("patterns:", patterns)
overlaps = []
for a in patterns:
for b in patterns:
if a != b and b in a:
overlaps.append((a, b))
print("overlaps:")
for a, b in sorted(set(overlaps)):
print(f" '{a}' contains '{b}'")
PYRepository: lightseekorg/smg
Length of output: 1156
Deterministic processor selection is not guaranteed.
Line 360 returns the first substring match from a HashMap, but iteration order is non-deterministic. With overlapping patterns (for example llava-next and llava), this can select the wrong processor unpredictably. The patterns registered include both llava-next and llava, where one is a substring of the other, making the issue concrete.
🔧 Proposed fix (choose most specific match deterministically)
fn find_in_candidate(&self, candidate: &str) -> Option<&dyn ImagePreProcessor> {
let candidate = candidate.to_ascii_lowercase();
- for (pattern, processor) in &self.processors {
- if candidate.contains(&pattern.to_ascii_lowercase()) {
- return Some(processor.as_ref());
- }
- }
- None
+ self.processors
+ .iter()
+ .filter(|(pattern, _)| candidate.contains(&pattern.to_ascii_lowercase()))
+ .max_by(|(a, _), (b, _)| {
+ a.len()
+ .cmp(&b.len())
+ .then_with(|| a.as_str().cmp(b.as_str()))
+ })
+ .map(|(_, processor)| processor.as_ref())
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/multimodal/src/vision/image_processor.rs` around lines 358 - 364, The
current find_in_candidate method returns the first substring match from
self.processors (a HashMap), which is non-deterministic and can pick
shorter/overlapping patterns like "llava" before "llava-next"; change it to
collect all processors whose pattern (case-insensitive) is contained in the
candidate, then choose the most specific match deterministically by sorting
matches by descending pattern length and then by a stable tie-breaker (e.g.,
lexicographic order) and returning the first entry. Update find_in_candidate to
(1) lowercase candidate once, (2) iterate through self.processors to collect
matching (pattern, processor) pairs, (3) sort the collected matches by
(-pattern.len(), pattern) and (4) return the processor reference of the top
match or None if empty, ensuring case-insensitive comparisons and deterministic
selection.
Signed-off-by: Arthur Cheng <arthur.cheng@oracle.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/multimodal/src/vision/image_processor.rs (1)
358-366:⚠️ Potential issue | 🟠 MajorNon-deterministic processor selection for overlapping patterns.
This helper returns the first substring match from a
HashMap, but iteration order is non-deterministic. With patterns like"llava-next"and"llava"both registered, a candidate like"llava-next-7b"matches both patterns, and either processor could be returned depending on HashMap layout.The fix from the previous review (selecting the longest matching pattern) was not applied.
🔧 Proposed fix (select longest matching pattern deterministically)
fn find_in_candidate(&self, candidate: &str) -> Option<&dyn ImagePreProcessor> { let candidate = candidate.to_ascii_lowercase(); - for (pattern, processor) in &self.processors { - if candidate.contains(&pattern.to_ascii_lowercase()) { - return Some(processor.as_ref()); - } - } - None + self.processors + .iter() + .filter(|(pattern, _)| candidate.contains(&pattern.to_ascii_lowercase())) + .max_by(|(a, _), (b, _)| { + a.len() + .cmp(&b.len()) + .then_with(|| a.as_str().cmp(b.as_str())) + }) + .map(|(_, processor)| processor.as_ref()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/multimodal/src/vision/image_processor.rs` around lines 358 - 366, find_in_candidate currently returns the first matching processor by iterating a HashMap (non-deterministic) which causes wrong selection for overlapping patterns; change it to scan all entries in self.processors, compare lowercase pattern against the already-lowercased candidate, track the longest-matching pattern (e.g., keep best_len and best_processor) and return the processor for the longest match at the end; reference the find_in_candidate function, the processors field, and the pattern/processor variables when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/multimodal/src/vision/image_processor.rs`:
- Around line 358-366: find_in_candidate currently returns the first matching
processor by iterating a HashMap (non-deterministic) which causes wrong
selection for overlapping patterns; change it to scan all entries in
self.processors, compare lowercase pattern against the already-lowercased
candidate, track the longest-matching pattern (e.g., keep best_len and
best_processor) and return the processor for the longest match at the end;
reference the find_in_candidate function, the processors field, and the
pattern/processor variables when making this change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6c85fbea-d2c6-4497-af8b-ae038a774d4e
📒 Files selected for processing (1)
crates/multimodal/src/vision/image_processor.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5a64908f5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self.find_in_candidate(model_id) | ||
| .or_else(|| model_type.and_then(|model_type| self.find_in_candidate(model_type))) |
There was a problem hiding this comment.
Normalize model_type separators before fallback lookup
ImageProcessorRegistry::find now falls back to model_type, but it feeds the raw string into substring matching without normalizing separators. Because built-in keys include hyphenated patterns like llava-next/llava-v1.6, an underscore-form config value such as llava_next will miss the intended specific match and can fall through to the generic llava processor, causing aliased LLaVA-NeXT requests to be preprocessed with the wrong pipeline. Please normalize model_type (e.g., _/-) or add equivalent aliases before matching.
Useful? React with 👍 / 👎.
Fixes #754
Description
Problem
Multimodal family detection in the gRPC chat preparation path was keyed off the request-facing model name. When a worker exposed a supported vision model under an alias such as
custom-model, tokenizer and config loading could succeed but multimodal spec selection and image processor lookup still failed before dispatch.Solution
Use the loaded model config as a fallback signal during multimodal family detection. The multimodal registry now checks
config.model_typewhenmodel_iddoes not match, and the image processor registry does the same while preserving the existing fast path onmodel_id.Changes
ModelMetadata::config_model_type()andmatches_model_type_hints()helpers for shared family matching logicmodel_idtoconfig.model_typefind(model_id, model_type)with model-type fallbackmodel_typeinmodel_gatewaymultimodal preprocessing before image processor selectionTest Plan
custom-model.tokenizer_path/model_pathpointing at the real model artifacts containing the correctconfig.jsonandpreprocessor_config.json.make fmtmake checkmake testChecklist
cargo +nightly fmtpassescargo clippy --all-targets --all-features -- -D warningspassesSummary by CodeRabbit
Refactor
Tests