feat(gateway): add message_utils and MessagePreparationStage for Messages API#741
feat(gateway): add message_utils and MessagePreparationStage for Messages API#741
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly extends the gateway's capabilities by laying the groundwork for the Anthropic Messages API. It introduces a new pipeline stage and a set of utility functions to seamlessly convert Messages API requests into the internal chat template format, enabling consistent processing while reusing existing infrastructure. This is a foundational step in a series of changes to fully support the Messages API. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughAdds a Messages API preparation pipeline: new MessagePreparationStage, message transformation utilities converting Anthropic CreateMessageRequest into the internal chat-template flow, minor visibility and lint tweaks across gRPC context and utils, and wiring of the new messages submodule. Changes
Sequence DiagramsequenceDiagram
participant GRPC as GRPC Pipeline
participant MPS as MessagePreparationStage
participant CTX as RequestContext
participant TKZ as Tokenizer
participant TOOL as Tool Extractor
participant CONS as Constraint Generator
participant STATE as Response State
GRPC->>MPS: execute(request, ctx)
MPS->>CTX: resolve_tokenizer()
CTX-->>MPS: tokenizer
MPS->>TOOL: extract_chat_tools(request.tools)
TOOL-->>MPS: chat_tools
MPS->>MPS: process_messages(request, tokenizer, chat_tools)
MPS->>TKZ: tokenize(formatted_text)
TKZ-->>MPS: token_ids
alt tools present
MPS->>CONS: generate_tool_constraints(chat_tools)
CONS-->>MPS: constraints
end
MPS->>MPS: create_stop_decoder(stop_sequences)
MPS->>CTX: store_preparation_output(prep_output)
MPS->>STATE: store_stop_decoder(decoder)
MPS-->>GRPC: return success
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan for PR comments
Comment |
|
Hi @slin1237, the DCO sign-off check has failed. All commits must include a To fix existing commits: # Sign off the last N commits (replace N with the number of unsigned commits)
git rebase HEAD~N --signoff
git push --force-with-leaseTo sign off future commits automatically:
|
There was a problem hiding this comment.
Code Review
This pull request introduces the message_utils and MessagePreparationStage to support the Anthropic Messages API, aligning it with the existing chat pipeline. The changes include conversion functions for Messages API types to the internal chat template format, and a new preparation stage that leverages shared utilities. The visibility of process_tool_call_arguments in chat_utils.rs has been updated to pub(crate) to facilitate reuse. Overall, the changes are well-structured and follow the established architecture for handling different API types.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@model_gateway/src/routers/grpc/utils/message_utils.rs`:
- Around line 223-242: The fold currently overwrites reasoning for
InputContentBlock::Thinking, losing prior blocks; change the accumulated
reasoning from Option<String> to Vec<String> (e.g., in the initial tuple passed
to blocks.iter().fold and the fold closure), push each t.thinking.clone() on
InputContentBlock::Thinking, and after the fold join the Vec<String> with the
desired separator (or keep the Vec if callers can handle it) so that the
produced (text_parts, tool_calls, reasoning) preserves all thinking blocks
instead of only the last one.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2c8607ce-0dea-40b6-b525-bff4a7920308
📒 Files selected for processing (8)
model_gateway/src/routers/grpc/context.rsmodel_gateway/src/routers/grpc/regular/stages/messages/mod.rsmodel_gateway/src/routers/grpc/regular/stages/messages/preparation.rsmodel_gateway/src/routers/grpc/regular/stages/mod.rsmodel_gateway/src/routers/grpc/regular/stages/preparation.rsmodel_gateway/src/routers/grpc/utils/chat_utils.rsmodel_gateway/src/routers/grpc/utils/message_utils.rsmodel_gateway/src/routers/grpc/utils/mod.rs
💤 Files with no reviewable changes (1)
- model_gateway/src/routers/grpc/context.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dee954d7bc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let tool_call_constraint = if filtered_tools.is_empty() { | ||
| None |
There was a problem hiding this comment.
Reject required tool_choice when no matching tools remain
This branch skips tool constraints whenever filtered_tools is empty, which lets requests continue even when tool_choice requires tool use (any or a specific tool) but filtering/adaptation left no usable custom tools. In that scenario the request silently degrades to unconstrained text generation rather than returning a client error, violating the caller’s explicit tool-choice contract.
Useful? React with 👍 / 👎.
…ages API
Add the first message-specific pipeline stage (Stage 1: Preparation) and
the utility functions it needs to convert Anthropic Messages API types
into the internal chat template format.
What changed:
- New message_utils.rs with conversion functions:
- process_messages(): top-level orchestrator parallel to process_chat_messages()
- process_message_content_format(): converts InputMessage to Vec<Value> JSON
- convert_user_message(): handles user messages, splits ToolResult into
separate "tool" role messages
- convert_assistant_message(): extracts text, tool_calls, reasoning_content
- extract_chat_tools(): filters Custom tools and converts to chat::Tool
- convert_message_tool_choice(): maps Messages ToolChoice to chat ToolChoice
- extract_tool_result_text(): helper for ToolResult content extraction
- 7 unit tests covering all major conversion paths
- New MessagePreparationStage (parallel to ChatPreparationStage):
- Same structure as ChatPreparationStage (impl method pattern)
- Resolves tokenizer, converts/filters tools, processes messages,
tokenizes, builds tool constraints, creates stop decoder
- Multimodal processing postponed (marked with async for future .await)
- Made process_tool_call_arguments pub(crate) in chat_utils.rs for reuse
- Updated delegating PreparationStage to use Display-based error messages
- Removed stale #[expect(dead_code)] from messages_request_arc (now used)
Why:
This is PR 2 in the Messages API gRPC pipeline series. PR 1 (#739) added
type scaffolding. This PR adds the preparation stage that converts
Messages API requests into the shared internal format, enabling the
existing request building and response processing stages to work with
Messages API requests in follow-up PRs.
How:
Follows the same architecture as chat: reuses shared utilities
(resolve_tokenizer, filter_tools_by_tool_choice, generate_tool_constraints,
create_stop_decoder, process_tool_call_arguments) and only replaces the
message-specific conversion layer (process_content_format → process_message_content_format).
Refs: #738
Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@model_gateway/src/routers/grpc/utils/message_utils.rs`:
- Around line 366-374: The current extract_chat_tools function silently drops
non-Custom messages::Tool variants which changes request semantics; change
extract_chat_tools to return a Result<Vec<ChatTool>, RpcError> (or your
project's equivalent error type) and validate each tool: if a messages::Tool is
Custom convert it via custom_tool_to_chat_tool, but if it is Bash, WebSearch,
TextEditor, McpToolset or any other non-Custom variant return an explicit client
error (400) with a clear message listing the unsupported tool(s). Update callers
(e.g., the MessagePreparationStage path) to propagate/handle this Result so
requests with only unsupported tools fail fast instead of being treated as
no-tools.
- Around line 160-191: The current fold over InputContent::Blocks collects all
user_parts and tool_msgs then appends user content first, which reorders mixed
sequences (e.g., [text, tool_result, text]); modify the logic in the
InputContent::Blocks handling (around the fold) to iterate with a for loop over
blocks, accumulating user_parts and whenever you encounter
InputContentBlock::ToolResult flush the accumulated user_parts by calling
format_content_parts (using the same content_format) and push a user json into
result, then push the tool message (use extract_tool_result_text and
tr.tool_use_id) immediately to preserve original ordering, and continue
accumulating subsequent user_parts; ensure any remaining user_parts are flushed
after the loop.
- Around line 266-280: The current format_content_parts (match arm
ChatTemplateContentFormat::String) collapses non-text-only parts into an empty
string; change it to detect when no text parts were extracted and in that case
return the original parts as a Value::Array (preserving image/document
placeholders) instead of Value::String(""), mirroring the behavior of
transform_content_field; locate format_content_parts and update the
ChatTemplateContentFormat::String branch to conditionally return the joined text
when present or the original parts array when text is absent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 46453291-4bcd-410a-88ce-261e231359a0
📒 Files selected for processing (8)
model_gateway/src/routers/grpc/context.rsmodel_gateway/src/routers/grpc/regular/stages/messages/mod.rsmodel_gateway/src/routers/grpc/regular/stages/messages/preparation.rsmodel_gateway/src/routers/grpc/regular/stages/mod.rsmodel_gateway/src/routers/grpc/regular/stages/preparation.rsmodel_gateway/src/routers/grpc/utils/chat_utils.rsmodel_gateway/src/routers/grpc/utils/message_utils.rsmodel_gateway/src/routers/grpc/utils/mod.rs
💤 Files with no reviewable changes (1)
- model_gateway/src/routers/grpc/context.rs
| pub(crate) fn extract_chat_tools(tools: &[messages::Tool]) -> Vec<ChatTool> { | ||
| tools | ||
| .iter() | ||
| .filter_map(|t| match t { | ||
| messages::Tool::Custom(custom) => Some(custom_tool_to_chat_tool(custom)), | ||
| _ => None, | ||
| }) | ||
| .collect() | ||
| } |
There was a problem hiding this comment.
Reject unsupported Messages tool types instead of silently dropping them.
Filtering out non-Custom variants here changes request semantics. A request containing only Bash/WebSearch/TextEditor/McpToolset tools reaches MessagePreparationStage as if it had no tools at all, which also bypasses tool_choice enforcement for any or named-tool requests. This should fail fast with a 400 rather than degrade into an unconstrained generation request.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@model_gateway/src/routers/grpc/utils/message_utils.rs` around lines 366 -
374, The current extract_chat_tools function silently drops non-Custom
messages::Tool variants which changes request semantics; change
extract_chat_tools to return a Result<Vec<ChatTool>, RpcError> (or your
project's equivalent error type) and validate each tool: if a messages::Tool is
Custom convert it via custom_tool_to_chat_tool, but if it is Bash, WebSearch,
TextEditor, McpToolset or any other non-Custom variant return an explicit client
error (400) with a clear message listing the unsupported tool(s). Update callers
(e.g., the MessagePreparationStage path) to propagate/handle this Result so
requests with only unsupported tools fail fast instead of being treated as
no-tools.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d826daa9ef
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if !user_parts.is_empty() { | ||
| let content = format_content_parts(user_parts, content_format); | ||
| result.push(json!({"role": "user", "content": content})); | ||
| } | ||
| result.extend(tool_msgs); |
There was a problem hiding this comment.
Preserve tool-result block order when splitting user content
convert_user_message() accumulates all user text/media parts and all tool-result parts separately, then always appends the synthesized user message before appending tool messages. For mixed user content where a tool_result block appears before text (or is interleaved), this reorders the turn and can make the model consume follow-up user text before the tool output it depends on, producing incorrect tool-loop behavior.
Useful? React with 👍 / 👎.
| Some(ToolResultContent::Blocks(blocks)) => blocks | ||
| .iter() | ||
| .filter_map(|b| match b { | ||
| messages::ToolResultContentBlock::Text(t) => Some(t.text.as_str()), | ||
| _ => None, |
There was a problem hiding this comment.
Preserve non-text tool results instead of dropping them
extract_tool_result_text() only keeps ToolResultContentBlock::Text and discards other valid block types (Image, Document, SearchResult), so non-text tool results are silently converted to empty/partial tool messages. When tools return non-text output, the prompt loses the actual result content and the model receives an incorrect conversation state.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
model_gateway/src/routers/grpc/utils/message_utils.rs (3)
273-289:⚠️ Potential issue | 🟠 MajorKeep non-text-only parts instead of returning
""inStringmode.When a user turn contains only image/document placeholders, this returns an empty string and silently drops the content.
transform_content_field()inmodel_gateway/src/routers/grpc/utils/chat_utils.rspreserves the original array when no text parts exist, so the Messages path currently diverges from the Chat path.Suggested fix
fn format_content_parts(parts: Vec<Value>, content_format: ChatTemplateContentFormat) -> Value { match content_format { ChatTemplateContentFormat::String => { - // Extract text parts and join - let text: String = parts + let text_parts: Vec<String> = parts .iter() .filter_map(|p| { p.as_object() .and_then(|obj| obj.get("type")?.as_str().filter(|&t| t == "text")) .and_then(|_| p.as_object()?.get("text")?.as_str()) .map(String::from) }) - .collect::<Vec<_>>() - .join(" "); - Value::String(text) + .collect(); + + if text_parts.is_empty() { + Value::Array(parts) + } else { + Value::String(text_parts.join(" ")) + } } ChatTemplateContentFormat::OpenAI => Value::Array(parts), } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/utils/message_utils.rs` around lines 273 - 289, The current format_content_parts function drops non-text-only content by returning an empty string when no text parts are found; change the ChatTemplateContentFormat::String branch to collect text parts as before but, if the collected text Vec is empty, return the original parts as Value::Array(parts) instead of Value::String(""), otherwise join and return Value::String(joined_text); this keeps behavior consistent with transform_content_field and preserves image/document placeholders.
368-381:⚠️ Potential issue | 🟠 MajorUnsupported Messages tools should fail fast, not vanish.
Filtering out every non-
Customvariant changes request semantics. A request containing onlyBash/WebSearch/TextEditor/McpToolsettools reachesMessagePreparationStageas if it had no tools, which also bypasses named/requiredtool_choicehandling.Return a
Resulthere and letMessagePreparationStageconvert unsupported tool variants into a 400 with a clear error message instead of silently degrading the request.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/utils/message_utils.rs` around lines 368 - 381, The current extract_chat_tools function silently drops non-Custom Tool variants which changes request semantics; change extract_chat_tools to return a Result<Vec<ChatTool>, ErrorType> (or a domain error) and scan tools: map Custom -> custom_tool_to_chat_tool, and for any other variant return an Err containing a clear validation message naming the unsupported variant(s) so MessagePreparationStage can convert that into a 400; update callers (e.g., MessagePreparationStage) to handle the Result and propagate the error as a bad-request response.
160-191:⚠️ Potential issue | 🟠 MajorPreserve mixed user/tool-result ordering.
This fold buffers every user part and appends every synthesized
toolmessage afterward. A source turn like[text, tool_result, text]becomesuser -> toolinstead ofuser -> tool -> user, which changes the message sequence before templating.Suggested direction
fn convert_user_message( content: &InputContent, content_format: ChatTemplateContentFormat, result: &mut Vec<Value>, ) { match content { InputContent::String(text) => { result.push(json!({"role": "user", "content": text})); } InputContent::Blocks(blocks) => { - let (user_parts, tool_msgs) = blocks.iter().fold( - (Vec::new(), Vec::new()), - |(mut user_parts, mut tool_msgs), block| { - match block { - InputContentBlock::Text(t) => { - user_parts.push(json!({"type": "text", "text": t.text})); - } - InputContentBlock::Image(_) => { - user_parts.push(json!({"type": "image"})); - } - InputContentBlock::Document(_) => { - user_parts.push(json!({"type": "document"})); - } - InputContentBlock::ToolResult(tr) => { - tool_msgs.push(json!({ - "role": "tool", - "tool_call_id": tr.tool_use_id, - "content": extract_tool_result_text(tr) - })); - } - _ => {} - } - (user_parts, tool_msgs) - }, - ); - - if !user_parts.is_empty() { - let content = format_content_parts(user_parts, content_format); - result.push(json!({"role": "user", "content": content})); - } - result.extend(tool_msgs); + let mut user_parts = Vec::new(); + for block in blocks { + match block { + InputContentBlock::Text(t) => { + user_parts.push(json!({"type": "text", "text": t.text})); + } + InputContentBlock::Image(_) => { + user_parts.push(json!({"type": "image"})); + } + InputContentBlock::Document(_) => { + user_parts.push(json!({"type": "document"})); + } + InputContentBlock::ToolResult(tr) => { + if !user_parts.is_empty() { + let content = + format_content_parts(std::mem::take(&mut user_parts), content_format); + result.push(json!({"role": "user", "content": content})); + } + result.push(json!({ + "role": "tool", + "tool_call_id": tr.tool_use_id, + "content": extract_tool_result_text(tr) + })); + } + _ => {} + } + } + + if !user_parts.is_empty() { + let content = format_content_parts(user_parts, content_format); + result.push(json!({"role": "user", "content": content})); + } } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/utils/message_utils.rs` around lines 160 - 191, The current InputContent::Blocks handling in the fold collects all user_parts and tool_msgs separately, losing original ordering (e.g., InputContentBlock::Text and InputContentBlock::ToolResult interleaving) and causing wrong sequencing before templating; change the accumulation to emit entries in-order by folding into a single Vec of enum-like items (or serde Values) that preserves each block as either a user-part (json produced by format_content_parts for contiguous user segments) or a tool message (json produced by extract_tool_result_text and tool metadata), flushing buffered user_parts whenever a ToolResult is encountered; update the code paths around format_content_parts, extract_tool_result_text, InputContentBlock handling, and where result is pushed so result receives entries in the original sequence rather than grouping all user parts first.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@model_gateway/src/routers/grpc/regular/stages/preparation.rs`:
- Around line 43-55: The match in PreparationStage::execute currently handles
RequestType::Chat and RequestType::Generate but misses RequestType::Messages,
causing Messages requests to hit the catch-all; add a branch for
RequestType::Messages(_) that delegates to the MessagePreparationStage (e.g.,
call self.message_stage.execute(ctx).await) before the fallback, ensuring the
new MessagePreparationStage is invoked like chat_stage and generate_stage are.
---
Duplicate comments:
In `@model_gateway/src/routers/grpc/utils/message_utils.rs`:
- Around line 273-289: The current format_content_parts function drops
non-text-only content by returning an empty string when no text parts are found;
change the ChatTemplateContentFormat::String branch to collect text parts as
before but, if the collected text Vec is empty, return the original parts as
Value::Array(parts) instead of Value::String(""), otherwise join and return
Value::String(joined_text); this keeps behavior consistent with
transform_content_field and preserves image/document placeholders.
- Around line 368-381: The current extract_chat_tools function silently drops
non-Custom Tool variants which changes request semantics; change
extract_chat_tools to return a Result<Vec<ChatTool>, ErrorType> (or a domain
error) and scan tools: map Custom -> custom_tool_to_chat_tool, and for any other
variant return an Err containing a clear validation message naming the
unsupported variant(s) so MessagePreparationStage can convert that into a 400;
update callers (e.g., MessagePreparationStage) to handle the Result and
propagate the error as a bad-request response.
- Around line 160-191: The current InputContent::Blocks handling in the fold
collects all user_parts and tool_msgs separately, losing original ordering
(e.g., InputContentBlock::Text and InputContentBlock::ToolResult interleaving)
and causing wrong sequencing before templating; change the accumulation to emit
entries in-order by folding into a single Vec of enum-like items (or serde
Values) that preserves each block as either a user-part (json produced by
format_content_parts for contiguous user segments) or a tool message (json
produced by extract_tool_result_text and tool metadata), flushing buffered
user_parts whenever a ToolResult is encountered; update the code paths around
format_content_parts, extract_tool_result_text, InputContentBlock handling, and
where result is pushed so result receives entries in the original sequence
rather than grouping all user parts first.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 075ff4c9-696a-46df-9bc6-e594e6905db5
📒 Files selected for processing (8)
model_gateway/src/routers/grpc/context.rsmodel_gateway/src/routers/grpc/regular/stages/messages/mod.rsmodel_gateway/src/routers/grpc/regular/stages/messages/preparation.rsmodel_gateway/src/routers/grpc/regular/stages/mod.rsmodel_gateway/src/routers/grpc/regular/stages/preparation.rsmodel_gateway/src/routers/grpc/utils/chat_utils.rsmodel_gateway/src/routers/grpc/utils/message_utils.rsmodel_gateway/src/routers/grpc/utils/mod.rs
💤 Files with no reviewable changes (1)
- model_gateway/src/routers/grpc/context.rs
| match &ctx.input.request_type { | ||
| RequestType::Chat(_) => self.chat_stage.execute(ctx).await, | ||
| RequestType::Generate(_) => self.generate_stage.execute(ctx).await, | ||
| other => { | ||
| let type_name = match other { | ||
| RequestType::Embedding(_) => "Embedding", | ||
| RequestType::Classify(_) => "Classify", | ||
| RequestType::Responses(_) => "Responses", | ||
| _ => "Unknown", | ||
| }; | ||
| request_type => { | ||
| error!( | ||
| function = "PreparationStage::execute", | ||
| "RequestType::{type_name} reached regular preparation stage" | ||
| request_type = %request_type, | ||
| "{request_type} request type reached regular preparation stage" | ||
| ); | ||
| Err(grpc_error::internal_error( | ||
| "wrong_pipeline", | ||
| format!("RequestType::{type_name} should use its dedicated pipeline"), | ||
| format!("{request_type} should use its dedicated pipeline"), | ||
| )) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf 'RequestType variants:\n'
rg -n -C2 '\benum\s+RequestType\b|RequestType::Messages|Messages\(' model_gateway/src/routers/grpc
printf '\nPreparation-stage dispatch:\n'
rg -n -C2 'MessagePreparationStage|RequestType::Chat|RequestType::Generate|RequestType::Messages' \
model_gateway/src/routers/grpc/regular/stagesRepository: lightseekorg/smg
Length of output: 15470
Delegate RequestType::Messages before falling back to wrong_pipeline.
The preparation stage match only routes Chat and Generate. The new MessagePreparationStage is therefore unreachable—a Messages request will fall through to the catch-all and return the internal wrong_pipeline error instead of being prepared. Other stages in the regular pipeline (request_building and response_processing) already explicitly handle Messages, so this dispatch is inconsistent.
Verification
RequestType::Messages is defined in context.rs:56, and MessagePreparationStage is properly exported. However, the preparation stage dispatcher (lines 43–46) only has branches for Chat and Generate, while request_building.rs and response_processing.rs both have explicit (RequestType::Responses(_) | RequestType::Messages(_)) branches.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@model_gateway/src/routers/grpc/regular/stages/preparation.rs` around lines 43
- 55, The match in PreparationStage::execute currently handles RequestType::Chat
and RequestType::Generate but misses RequestType::Messages, causing Messages
requests to hit the catch-all; add a branch for RequestType::Messages(_) that
delegates to the MessagePreparationStage (e.g., call
self.message_stage.execute(ctx).await) before the fallback, ensuring the new
MessagePreparationStage is invoked like chat_stage and generate_stage are.
…arams for Messages API
Add Stage 4 (request building) for the Messages API gRPC pipeline,
converting PreparationOutput + CreateMessageRequest sampling parameters
into backend-specific proto GenerateRequest.
What changed:
- model_gateway/src/routers/grpc/regular/stages/messages/request_building.rs:
New MessageRequestBuildingStage (copied from chat, adapted for Messages).
Uses msg_{uuid} request ID prefix, calls build_messages_request(),
skips multimodal (postponed), no filtered_request pattern.
- model_gateway/src/routers/grpc/client.rs:
Add build_messages_request() dispatcher on GrpcClient enum, dispatching
to each backend's build_generate_request_from_messages().
- crates/grpc_client/src/sglang_scheduler.rs:
Add build_generate_request_from_messages() and
build_grpc_sampling_params_from_messages(). Maps CreateMessageRequest
fields (max_tokens, temperature, top_p, top_k, stop_sequences) to
sglang proto SamplingParams with sensible defaults for missing fields.
- crates/grpc_client/src/vllm_engine.rs:
Same pattern for vLLM backend. Handles vLLM-specific differences
(top_k=0 for disabled, Optional<f32> temperature).
- crates/grpc_client/src/trtllm_service.rs:
Same pattern for TRT-LLM backend using SamplingConfig, OutputConfig,
and GuidedDecodingParams proto types.
- model_gateway/src/routers/grpc/regular/stages/messages/mod.rs:
Wire request_building module and re-export MessageRequestBuildingStage.
Why: This is PR 3 in the Messages API gRPC series. Stage 4 bridges
the gap between preparation (Stage 1, PR #741) and response processing
(Stage 7, future PR), enabling the pipeline to build backend-specific
proto requests from Messages API parameters.
Refs: #739, #741
Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
…on-streaming) Add Stage 7 (response processing) for the Messages API gRPC pipeline. This converts backend ProtoGenerateComplete responses into Anthropic Message format with proper ContentBlock construction and StopReason mapping. Non-streaming only; streaming deferred to follow-up PR. What changed: - processor.rs: add process_non_streaming_messages_response() to ResponseProcessor — full pipeline: token decoding, reasoning parsing, tool call parsing, content block construction (Thinking → Text → ToolUse), StopReason mapping (EndTurn/MaxTokens/StopSequence/ToolUse), and messages::Usage building - messages/response_processing.rs: new MessageResponseProcessingStage that extracts execution result, dispatch metadata, tokenizer, and stop decoder from RequestContext, delegates to ResponseProcessor, and stores FinalResponse::Messages - message_utils.rs: add get_history_tool_calls_count_messages() for counting tool use blocks in Messages API request history (needed for KimiK2-style tool call ID generation) - messages/mod.rs: wire response_processing module with unused_imports expect (wired in pipeline factory PR) Why: This is the fourth PR in the Messages API gRPC support series. With preparation (PR #741), request building (PR #744), and now response processing, three of the four endpoint-specific pipeline stages are complete. The shared stages (worker selection, client acquisition, dispatch, execution) are reused from the existing pipeline. How: Follows the same architecture as chat's response processing but adapted for Anthropic Message types: - Reuses existing convert_message_tool_choice() from message_utils to bridge Messages ToolChoice → Chat ToolChoice for parse_json_schema_response - Reuses ResponseProcessor's parse_tool_calls() for model-predicted path - Content blocks ordered per Anthropic convention: Thinking first, Text, then ToolUse blocks - Tool calls parsed as OpenAI ToolCall (via existing parsers) then converted to ContentBlock::ToolUse with JSON input - Messages always n=1, no logprobs - ThinkingConfig::Enabled check replaces separate_reasoning bool Refs: #739, #741, #744 Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
Summary
message_utils.rswith conversion functions for Anthropic Messages API types → internal chat template formatMessagePreparationStage(Stage 1) parallel toChatPreparationStageprocess_tool_call_argumentspub(crate)for reuse across message and chat pathsPR 2 in the Messages API gRPC pipeline series. PR 1 was #739 (type scaffolding).
What changed
New:
message_utils.rsConversion utilities parallel to
chat_utils.rsbut forCreateMessageRequest/InputMessage:process_messages()— top-level orchestrator (parallel toprocess_chat_messages())process_message_content_format()— convertsInputMessage[]toVec<Value>for chat templateconvert_user_message()— user messages with ToolResult splitting into separate "tool" role messagesconvert_assistant_message()— extracts text, tool_calls, reasoning_contentextract_chat_tools()/convert_message_tool_choice()— type adapters from Messages API to chat typesextract_tool_result_text()— helper for ToolResult content extractionNew:
MessagePreparationStageCreated via
git-cpfromChatPreparationStageto preserve file history. Key differences:messages_request_arc()instead ofchat_request_arc()message_utils::process_messages()instead ofutils::process_chat_messages()extract_chat_tools()+convert_message_tool_choice()adaptersrequest.stop_sequences(Messages API) instead ofrequest.stop(Chat API).await)filtered_request/Cow<ChatCompletionRequest>patternModified
chat_utils.rs:process_tool_call_argumentsvisibility →pub(crate)stages/preparation.rs: delegating stage usesDisplay-based error messagescontext.rs: removed stale#[expect(dead_code)]frommessages_request_arcHow
Follows the same architecture as chat — reuses shared utilities (
resolve_tokenizer,filter_tools_by_tool_choice,generate_tool_constraints,create_stop_decoder,process_tool_call_arguments) and only replaces the message-specific conversion layer.resolve_tokenizer()process_tool_call_arguments()generate_tool_constraints()filter_tools_by_tool_choice()create_stop_decoder()Vec<String>)process_content_format()process_message_content_format()Test plan
cargo clippy -p smg --all-targets --all-features -- -D warnings— cleancargo fmt -p smg -- --check— cleancargo test -p smg --lib -- message_utils— 7/7 passcargo test -p smg -- message— all passRefs: #738
Summary by CodeRabbit
New Features
Refactor