Skip to content

refactor: make model field required, remove resolve_model_id#713

Merged
slin1237 merged 5 commits intomainfrom
refactor/require-model-field
Mar 23, 2026
Merged

refactor: make model field required, remove resolve_model_id#713
slin1237 merged 5 commits intomainfrom
refactor/require-model-field

Conversation

@CatherineSue
Copy link
Copy Markdown
Member

@CatherineSue CatherineSue commented Mar 10, 2026

Description

Problem

The model field handling across SMG is broken in several ways:

  1. default_model() masks missing modelsChatCompletionRequest, ResponsesRequest, and RerankRequest default model to "unknown" via serde when clients omit it. This silently forwards "unknown" to backends, which either reject it (vLLM, TRT-LLM) or ignore it (SGLang), producing confusing errors instead of a clean 400.

  2. resolve_model_id is dead code — Only called for route_generate, route_chat, route_completion, and route_messages in IGW mode. Since serde defaults model to "unknown", model is never None, so the auto-selection logic never executes.

  3. Inconsistent across endpoints — Some route methods call resolve_model_id with if self.enable_igw branching; others (route_responses, route_interactions, route_realtime_*) don't. No principled reason for the difference.

Solution

  • Remove serde default_model() annotations — model becomes a required JSON field. Missing model → serde deserialization error (400).
  • Remove resolve_model_id from RouterManager entirely.
  • Simplify all route methods — remove if self.enable_igw branching, pass model through directly.
  • Keep UNKNOWN_MODEL_ID constant only for metrics/hash ring fallback in genuinely optional cases (GenerateRequest, gRPC pipeline).

Changes

  • Protocol layer: Remove default_model(), default_model_name(), and their serde annotations from ChatCompletionRequest, ResponsesRequest, RerankRequest, TokenizeRequest, DetokenizeRequest. Fix InteractionsRequest Default impl to use model: None.
  • RouterManager: Delete resolve_model_id method (~48 lines). Simplify route_generate, route_chat, route_completion, route_messages — remove if self.enable_igw branching (~114 lines deleted total).
  • Tests: Update tokenize tests for required model field.

Breaking Changes

Clients that omit the model field from /v1/chat/completions, /v1/completions, /v1/responses, /v1/rerank, /v1/tokenize, or /v1/detokenize will now receive a 400 Bad Request instead of having "unknown" silently injected.

In practice, the only affected setup is non-IGW + HTTP + SGLang/vLLM:

  • gRPC routers — already effectively required model for tokenizer lookup (resolve_tokenizer fails without a valid model)
  • IGW mode — model is needed for router selection (get_router_for_model), so missing model already didn't work
  • OpenAI/Anthropic/Gemini routers — all upstream APIs require model
  • Non-IGW + HTTP + SGLang — the only case where omitting model "worked" end-to-end, because non-IGW ignores model for worker selection (homogeneous pool) and SGLang doesn't validate the model field. These clients just need to add "model": "<model-name>" to their requests.

This matches the OpenAI API spec where model is a required field.

Not affected:

  • /generateGenerateRequest.model remains Option<String> (SGLang native endpoint)
  • /v1/interactionsInteractionsRequest.model remains Option<String> (model or agent fallback)
  • /v1/realtime/* — Already had explicit validation requiring non-empty model

Test Plan

  • cargo test -p openai-protocol — all 36 tests pass
  • cargo test -p smg — 408 passed, 4 ignored (1 flaky perf test unrelated to changes)
  • cargo clippy --all-targets --all-features -- -D warnings — clean
Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated

Summary by CodeRabbit

  • Breaking Changes

    • Model identifier is now required for many endpoints; requests without it will be rejected (400/404 as applicable).
    • Implicit/default model fallbacks removed — supply an explicit model in requests.
  • Improvements

    • Routing and worker selection consistently use the provided model, improving metrics and error clarity.
  • Tests

    • Tests and fixtures updated to pass explicit model IDs and reflect stricter validation.
  • New Features

    • Mock worker now exposes stable server/model info endpoints for testing.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removed implicit/default model provisioning from protocols and made model identifiers required through the routing stack: protocol request serde defaults removed/adjusted, RequestContext/RequestInput model_id made concrete, many RouterTrait signatures changed from Option<&str> to &str, and worker selection/metrics/tokenizer lookups updated accordingly.

Changes

Cohort / File(s) Summary
Protocol serde & request defaults
crates/protocols/src/chat.rs, crates/protocols/src/rerank.rs, crates/protocols/src/responses.rs, crates/protocols/src/tokenize.rs, crates/protocols/src/generate.rs, crates/protocols/src/interactions.rs
Removed serde defaults for model (many requests now require model or use empty-string defaults); GenerateRequest.model uses default_unknown_model, other defaults set to None or String::new().
Protocol helper rename/export
crates/protocols/src/common.rs
Removed crate-private default_model; added exported default_unknown_model() returning the unknown sentinel.
Router trait & implementations
model_gateway/src/routers/mod.rs, model_gateway/src/routers/router_manager.rs, model_gateway/src/routers/openai/..., model_gateway/src/routers/grpc/..., model_gateway/src/routers/http/..., model_gateway/src/routers/pd_router.rs, model_gateway/src/routers/grpc/router.rs
Changed many RouterTrait method signatures to require model_id: &str; removed IGW-mode resolve_model_id and UNKNOWN_MODEL_ID fallbacks; routers now forward concrete model_id to pipeline/router internals.
Pipeline / context / stages
model_gateway/src/routers/grpc/context.rs, model_gateway/src/routers/grpc/pipeline.rs, model_gateway/src/routers/grpc/common/stages/..., model_gateway/src/routers/grpc/common/responses/utils.rs, model_gateway/src/routers/grpc/regular/...
RequestInput.model_id and related call contexts changed from Option<String> to String; worker selection, tokenizer lookup, and metrics use the concrete model_id; errors now return model_not_found(model) when appropriate.
HTTP server handlers & validation
model_gateway/src/server.rs
Handlers updated to pass &body.model directly; /v1/rerank now validates model is present and returns 400 with code: "model_required" if empty.
Benchmarks, mocks & tests
model_gateway/benches/*, model_gateway/tests/*, model_gateway/tests/common/mock_worker.rs, model_gateway/tests/common/tls_mock_worker.rs
Bench defaults and tests updated to include explicit model strings (e.g., "mock-model" / "unknown"); mock worker endpoints/JSON adjusted; tests updated to call routers with concrete model ids.
Worker selection / PD / HTTP selection logic
model_gateway/src/routers/http/router.rs, model_gateway/src/routers/http/pd_router.rs, model_gateway/src/routers/grpc/common/stages/worker_selection.rs
Selection functions now accept model_id: &str; treat UNKNOWN_MODEL_ID as wildcard when required; selection errors distinguish model-not-found vs. all-workers-unavailable.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant API as API Server
    participant RouterMgr as RouterManager
    participant Router as Router (gRPC/HTTP/OpenAI)
    participant Worker as Worker/Model

    Client->>API: POST /v1/... (body with "model")
    API->>RouterMgr: route_*(headers, body, model_id = &body.model)
    RouterMgr->>Router: select_router(model_id)
    Router->>Worker: forward request(model_id, payload)
    Worker-->>Router: response
    Router-->>RouterMgr: routed response
    RouterMgr-->>API: final response
    API-->>Client: HTTP response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • key4ng

Poem

🐇 I hopped through code with nimble feet,
Defaults trimmed back — now models must meet.
Strings travel straight from client to core,
Routers take names and stumble no more.
A carrot-hop cheer for the cleaner floor.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor: make model field required, remove resolve_model_id' clearly and concisely describes the main changes: making model fields required and removing resolve_model_id functionality.
Docstring Coverage ✅ Passed Docstring coverage is 95.92% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/require-model-field

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the model-gateway Model gateway crate changes label Mar 10, 2026
@CatherineSue CatherineSue force-pushed the refactor/require-model-field branch from 0e1f6d9 to 2c6a5b2 Compare March 10, 2026 21:40
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0e1f6d9d98

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread model_gateway/src/routers/router_manager.rs Outdated
Comment thread crates/protocols/src/rerank.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/protocols/src/responses.rs (1)

787-798: ⚠️ Potential issue | 🟡 Minor

ResponsesRequest::default() creates an empty model that bypasses routing safeguards.

The Default implementation sets model: String::new(), and GenerationRequest::get_model() returns Some("") for it. Validation does not reject empty models. While current production code explicitly constructs or carries model from upstream requests, the Default impl creates a latent path that could silently route with a blank model if future code uses it. Consider removing Default or adding model validation to prevent accidental misuse.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/protocols/src/responses.rs` around lines 787 - 798, The
ResponsesRequest::default() creates an empty model string which can bypass
routing; remove the Default impl for ResponsesRequest so callers must explicitly
set model, or if Default must exist change behavior so empty model is treated as
unset: update GenerationRequest::get_model() to return None when model is an
empty string (i.e., treat "" as no model) and/or change the default to a clearly
invalid sentinel (or make model an Option<String>) so validation rejects an
absent/empty model rather than silently routing with "".
crates/protocols/src/rerank.rs (1)

202-208: ⚠️ Potential issue | 🟠 Major

Empty model field from V1RerankReqInput conversion causes 404 failures in multi-router mode instead of failing fast.

When v1_rerank handler converts V1RerankReqInput to RerankRequest, it sets model: String::new(). In multi-router setups, select_router_for_request passes this empty string to get_router_for_model(""), which returns no workers. The handler then fails with 404 instead of rejecting the request at validation time.

Fix: Add model field to V1RerankReqInput (with appropriate default or validation) or validate RerankRequest.model for non-empty content before routing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/protocols/src/rerank.rs` around lines 202 - 208, The conversion impl
From<V1RerankReqInput> for RerankRequest is setting model to an empty string
(model: String::new()), which leads select_router_for_request ->
get_router_for_model("") to return no workers and a 404; update the conversion
to propagate a model value from V1RerankReqInput (add a model field to
V1RerankReqInput if missing) or, alternatively, add validation in v1_rerank
before routing to ensure RerankRequest.model is non-empty and return a
validation error; touch the From<V1RerankReqInput> for RerankRequest,
V1RerankReqInput struct, and the v1_rerank handler/select_router_for_request
validation paths when applying the fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/protocols/src/tokenize.rs`:
- Around line 232-241: Add a negative deserialization test for DetokenizeRequest
similar to test_tokenize_request_requires_model: create a JSON string that omits
the "model" field (e.g. just "tokens": ...) and attempt
serde_json::from_str::<DetokenizeRequest>(); assert that it returns an Err (i.e.
deserialization fails). Place it as a new #[test] fn (e.g.
test_detokenize_request_requires_model) next to
test_detokenize_request_single/batch and reference DetokenizeRequest and
TokensInput to locate the struct to verify the behavior.

In `@model_gateway/src/routers/router_manager.rs`:
- Around line 406-409: The current call to select_router_for_request(headers,
model_id) before invoking router.route_generate(...) allows model-less /generate
requests to be routed arbitrarily; change the logic so /generate preserves
explicit model resolution: if model_id is Some, keep current behavior (call
select_router_for_request and route_generate); if model_id is None then run the
previous “single-model default or ambiguity failure” resolution (i.e., determine
the unique routable model or return an error if multiple models are routable)
and only then call route_generate with that resolved model_id; ensure you update
the branching around select_router_for_request and route_generate to reject
missing model_id when multiple models are routable.

---

Outside diff comments:
In `@crates/protocols/src/rerank.rs`:
- Around line 202-208: The conversion impl From<V1RerankReqInput> for
RerankRequest is setting model to an empty string (model: String::new()), which
leads select_router_for_request -> get_router_for_model("") to return no workers
and a 404; update the conversion to propagate a model value from
V1RerankReqInput (add a model field to V1RerankReqInput if missing) or,
alternatively, add validation in v1_rerank before routing to ensure
RerankRequest.model is non-empty and return a validation error; touch the
From<V1RerankReqInput> for RerankRequest, V1RerankReqInput struct, and the
v1_rerank handler/select_router_for_request validation paths when applying the
fix.

In `@crates/protocols/src/responses.rs`:
- Around line 787-798: The ResponsesRequest::default() creates an empty model
string which can bypass routing; remove the Default impl for ResponsesRequest so
callers must explicitly set model, or if Default must exist change behavior so
empty model is treated as unset: update GenerationRequest::get_model() to return
None when model is an empty string (i.e., treat "" as no model) and/or change
the default to a clearly invalid sentinel (or make model an Option<String>) so
validation rejects an absent/empty model rather than silently routing with "".

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8f2a2cd4-55a2-4b7a-9835-a06421dc8980

📥 Commits

Reviewing files that changed from the base of the PR and between e431a1f and 2c6a5b2.

📒 Files selected for processing (7)
  • crates/protocols/src/chat.rs
  • crates/protocols/src/common.rs
  • crates/protocols/src/interactions.rs
  • crates/protocols/src/rerank.rs
  • crates/protocols/src/responses.rs
  • crates/protocols/src/tokenize.rs
  • model_gateway/src/routers/router_manager.rs
💤 Files with no reviewable changes (1)
  • crates/protocols/src/common.rs

Comment thread crates/protocols/src/tokenize.rs
Comment thread model_gateway/src/routers/router_manager.rs Outdated
Signed-off-by: Chang Su <chang.s.su@oracle.com>
…hods

Now that model is a required field in protocol types, the resolve_model_id
method is no longer needed. Simplify route_generate, route_chat,
route_completion, and route_messages by removing the enable_igw branching
and passing model directly from request bodies.

Signed-off-by: Chang Su <chang.s.su@oracle.com>
@slin1237 slin1237 force-pushed the refactor/require-model-field branch from 2c6a5b2 to fd577d8 Compare March 21, 2026 18:18
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
crates/protocols/src/tokenize.rs (1)

231-245: 🧹 Nitpick | 🔵 Trivial

Add matching negative test for DetokenizeRequest.

The PR updated the detokenize tests to include the model field, but there's no corresponding negative test verifying that DetokenizeRequest also fails deserialization when model is omitted. This would ensure parity with the test_tokenize_request_requires_model test.

Proposed test
+    #[test]
+    fn test_detokenize_request_requires_model() {
+        let json = r#"{"tokens": [1, 2, 3]}"#;
+        let result = serde_json::from_str::<DetokenizeRequest>(json);
+        assert!(result.is_err(), "Should fail without model field");
+    }
+
     #[test]
     fn test_detokenize_request_single() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/protocols/src/tokenize.rs` around lines 231 - 245, Add a negative test
that verifies DetokenizeRequest deserialization fails when the required model
field is omitted: create a new test (e.g.,
test_detokenize_request_requires_model) that tries to deserialize a JSON string
like r#"{"tokens": [1,2,3]}"# (and another case for batch tokens if desired)
into DetokenizeRequest using serde_json::from_str and assert that it returns an
Err; reference the DetokenizeRequest type and TokensInput enum to mirror the
existing positive tests and ensure parity with
test_tokenize_request_requires_model.
model_gateway/src/routers/router_manager.rs (1)

509-526: ⚠️ Potential issue | 🟠 Major

Preserve explicit model resolution for /generate.

/generate is the one model-optional path per PR objectives, but with resolve_model_id removed, select_router_for_request(headers, model_id) with model_id=None will select a router based solely on headers and worker distribution scoring. In IGW mode, this means a request without model_id can now be sent to an arbitrary router instead of following the previous "single model default or ambiguity failure" behavior.

Consider either:

  1. Preserving the old inference logic for this endpoint (if only one model is routable, use it; otherwise fail)
  2. Rejecting missing model_id when multiple models are routable in IGW mode
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/router_manager.rs` around lines 509 - 526,
route_generate currently forwards requests with model_id=None to
select_router_for_request, which allows header-only selection and breaks the
previous explicit-model resolution behavior; change it so that when
model_id.is_none() we first attempt the old resolution logic (call
self.resolve_model_id(headers) or equivalent) and only then call
select_router_for_request with the resolved model id; if resolution yields
multiple routable models in IGW mode return a NOT_FOUND or explicit error
(reject missing model_id), and if resolution yields exactly one use that model
id to pick the router and proceed with router.route_generate(headers, body,
Some(resolved_model_id)).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/protocols/src/tokenize.rs`:
- Around line 231-245: Add a negative test that verifies DetokenizeRequest
deserialization fails when the required model field is omitted: create a new
test (e.g., test_detokenize_request_requires_model) that tries to deserialize a
JSON string like r#"{"tokens": [1,2,3]}"# (and another case for batch tokens if
desired) into DetokenizeRequest using serde_json::from_str and assert that it
returns an Err; reference the DetokenizeRequest type and TokensInput enum to
mirror the existing positive tests and ensure parity with
test_tokenize_request_requires_model.

In `@model_gateway/src/routers/router_manager.rs`:
- Around line 509-526: route_generate currently forwards requests with
model_id=None to select_router_for_request, which allows header-only selection
and breaks the previous explicit-model resolution behavior; change it so that
when model_id.is_none() we first attempt the old resolution logic (call
self.resolve_model_id(headers) or equivalent) and only then call
select_router_for_request with the resolved model id; if resolution yields
multiple routable models in IGW mode return a NOT_FOUND or explicit error
(reject missing model_id), and if resolution yields exactly one use that model
id to pick the router and proceed with router.route_generate(headers, body,
Some(resolved_model_id)).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: dd0c9ef9-ff84-4eab-8b40-ac3a4663ca3e

📥 Commits

Reviewing files that changed from the base of the PR and between 2c6a5b2 and fd577d8.

📒 Files selected for processing (7)
  • crates/protocols/src/chat.rs
  • crates/protocols/src/common.rs
  • crates/protocols/src/interactions.rs
  • crates/protocols/src/rerank.rs
  • crates/protocols/src/responses.rs
  • crates/protocols/src/tokenize.rs
  • model_gateway/src/routers/router_manager.rs
💤 Files with no reviewable changes (1)
  • crates/protocols/src/common.rs

… wrappers

Cascade the required model field through the entire routing layer:
- RouterTrait: all methods now take model_id: &str (no more Option)
- GenerateRequest.model: Option<String> → String with serde default "unknown"
- HTTP/gRPC routers: remove enable_igw model-filtering guard, always route by model
- select_worker_for_model: treats "unknown" as wildcard (any worker)
- Pipeline contexts: model_id: Option<String> → String
- Worker selection: always filters by model, no UNKNOWN_MODEL_ID fallbacks
- V1 rerank: resolves model from available workers when absent

Signed-off-by: Simon Lin <simonslin@gmail.com>
Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
Update mock worker and test assertions for the required model field:
- Mock worker: add served_model_name to server_info, add /server_info
  and /model_info route aliases, remove should_fail from server_info
  (metadata endpoints must always succeed for worker registration)
- Test payloads: use "mock-model" consistently (matching the mock worker)
- Router: distinguish 404 (model not found) from 503 (workers exist but
  unavailable) when select_worker returns None
- GenerateRequest.model defaults to "unknown" via serde; router treats
  "unknown" as wildcard (any worker)

Signed-off-by: Simon Lin <simonslin@gmail.com>
Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
@github-actions github-actions Bot added grpc gRPC client and router changes benchmarks Benchmark changes tests Test changes openai OpenAI router changes labels Mar 22, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 96965b4756

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread model_gateway/src/routers/grpc/common/stages/worker_selection.rs Outdated
Comment thread model_gateway/src/routers/http/pd_router.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
model_gateway/tests/api/request_formats_test.rs (1)

70-99: 🧹 Nitpick | 🔵 Trivial

Add a missing-model assertion for the new 400 contract.

Lines 70-99 and Lines 118-151 only verify the happy path with model present. Since this PR changes /v1/chat/completions and /v1/completions from implicit fallback to model-required requests, this file should also pin the failure path; otherwise the old fallback can regress without tripping these smoke tests.

Also applies to: 118-151

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/tests/api/request_formats_test.rs` around lines 70 - 99, Add a
failing-path assertion that ensures requests without "model" return the new 400
contract: create a payload that omits "model" (e.g., same messages/params but no
model field), call ctx.make_request("/v1/chat/completions", payload).await,
assert the result indicates a failure (or response.status == 400) and that the
error body matches the expected shape (e.g., contains an "error" field with
message/type). Repeat the same missing-model assertion for the "/v1/completions"
test block so both endpoints validate the new model-required behavior.
model_gateway/tests/spec/rerank.rs (1)

372-383: ⚠️ Potential issue | 🟠 Major

Don't normalize a model-less V1 rerank request into model == "".

Lines 35-55 now establish that RerankRequest is invalid without a model, but this test still blesses a V1RerankReqInput -> RerankRequest conversion that manufactures an empty model. That recreates the omitted-model fallback this PR is removing and lets the V1 path dodge the new invariant until much later in the flow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/tests/spec/rerank.rs` around lines 372 - 383, The test
test_v1_to_rerank_request_conversion is incorrectly asserting that converting a
V1RerankReqInput yields a RerankRequest with model == ""; instead update the
test to assert the conversion is rejected because RerankRequest now requires a
model. Replace the direct Into conversion with a fallible conversion (use
TryFrom/TryInto or RerankRequest::try_from(V1RerankReqInput)) and assert that
the result is an Err (or that conversion panics if the codebase uses panics for
invalid input), and remove any assertion expecting an empty model; keep checks
for query/documents if relevant to the Err case.
model_gateway/src/server.rs (1)

200-208: ⚠️ Potential issue | 🟠 Major

Json<CompletionRequest> will return 422 (not 400) for a missing model field.

Axum's Json<T> extractor returns 422 Unprocessable Entity when required fields fail deserialization. If /v1/completions is meant to enforce the 400 status for missing models (as in other handlers), this needs a custom rejection mapping or a validated extractor that explicitly handles the missing model case. The same issue applies to the Json<T> tokenize/detokenize handlers elsewhere in this file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/server.rs` around lines 200 - 208, The v1_completions
handler (async fn v1_completions) currently uses axum's Json<CompletionRequest>
extractor which yields 422 on deserialization failures (e.g., missing model) but
you want a 400 for missing model; change the extractor to a validated/custom
extractor or map Json rejections so that when CompletionRequest deserializes but
the model field is absent you return a 400 before calling
state.router.route_completion(Some(&headers), &body, &body.model), and apply the
same change to the tokenize/detokenize handlers that also use Json<T> so they
likewise return 400 for missing model instead of 422.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs`:
- Around line 82-118: When select_single_worker() or select_pd_pair() returns
None in WorkerSelectionStage::execute, distinguish "model not advertised by any
worker" from "workers advertise model but none usable": call
any_external_worker_supports_model(model_id, headers, healthy_only = false) to
detect if any worker advertises the model at all; if that returns false return a
404 ("unsupported_model") otherwise return the existing 503
("no_available_workers"/"no_available_pd_worker_pairs"). Update the None
branches in the Regular and PrefillDecode arms to perform this check and map to
404 vs 503 accordingly, referencing select_single_worker, select_pd_pair,
any_external_worker_supports_model and WorkerSelectionStage::execute.

In `@model_gateway/src/routers/grpc/router.rs`:
- Around line 288-292: The early return in the gRPC router uses
validate_worker_availability(&self.worker_registry, model_id) which currently
emits service_unavailable() for missing/unsupported models; change the behavior
so a missing model returns model_not_found() (404) instead of 503. Either update
validate_worker_availability to detect the "model absent" case and return
model_not_found() there, or remap the result at the call site in router.rs
(where validate_worker_availability is invoked) to translate the specific "model
absent" error into model_not_found() while leaving other transient errors as
service_unavailable(); use the existing helper names
validate_worker_availability, service_unavailable, and model_not_found to locate
and implement the change.

In `@model_gateway/src/routers/http/pd_router.rs`:
- Around line 734-761: The code currently falls back to global pools when
get_by_model(model_id) is empty, which can route an explicit unsupported model
to unrelated prefill/decode workers; change the logic in the prefill_workers and
decode_workers blocks so that if self.worker_registry.get_by_model(model_id)
returns empty and model_id != UNKNOWN_MODEL_ID you do not call
get_prefill_workers()/get_decode_workers() but instead return an error (or fail
the request) indicating unsupported model; only allow the global fallback when
model_id == UNKNOWN_MODEL_ID. Update the branches around prefill_workers and
decode_workers (and any surrounding control flow that consumes them) to enforce
this check and avoid mixing pools for partially matched models.

In `@model_gateway/src/routers/http/router.rs`:
- Around line 281-300: The current check calls
self.worker_registry.get_workers_filtered(model_filter,
Some(WorkerType::Regular), Some(ConnectionMode::Http), None, false) which counts
unhealthy/stale workers and causes unhealthy-only matches to trigger a 503;
change the call to use healthy_only = true (match how
any_external_worker_supports_model does) so you only treat the absence of
healthy workers as service_unavailable while leaving unhealthy-only matches to
fall through to model_not_found/404.

In `@model_gateway/src/server.rs`:
- Around line 227-234: The code currently fills an empty RerankRequest.model by
taking the first entry from state.context.worker_registry.get_models(); instead,
detect when rerank_body.model.is_empty() and return a 400 Bad Request with a
clear error message (e.g. "model is required for /v1/rerank") rather than
choosing an arbitrary model; update the handler that processes RerankRequest
(the branch that references rerank_body and RerankRequest and calls
state.context.worker_registry.get_models()) to produce the HTTP 400 response
using the same error response mechanism the server uses elsewhere.

In `@model_gateway/tests/common/mock_worker.rs`:
- Around line 90-92: The TLS mock's route set and server_info payload diverge
from the plain mock; update model_gateway/tests/common/tls_mock_worker.rs so its
router registers the same endpoints as the plain mock (add routes for
"/server_info", "/model_info", and "/get_model_info" alongside
"/get_server_info") and make its server_info_handler return the same JSON
structure including the served_model_name field (match the payload shape
produced by the plain mock's server_info_handler) so TLS-path tests use the
identical discovery contract.

In `@model_gateway/tests/routing/load_balancing_test.rs`:
- Around line 319-323: The test is too loose using !resp.status().is_success();
tighten it to assert the exact no-workers status—change the assertion in
load_balancing_test.rs to expect HTTP 404 (StatusCode::NOT_FOUND) for the no
healthy-matching-worker path so it aligns with select_worker_for_model's
contract (503 reserved for circuit-breaker); update the failure message to
reflect the expected NOT_FOUND status and keep the check using resp.status() for
clarity.

---

Outside diff comments:
In `@model_gateway/src/server.rs`:
- Around line 200-208: The v1_completions handler (async fn v1_completions)
currently uses axum's Json<CompletionRequest> extractor which yields 422 on
deserialization failures (e.g., missing model) but you want a 400 for missing
model; change the extractor to a validated/custom extractor or map Json
rejections so that when CompletionRequest deserializes but the model field is
absent you return a 400 before calling
state.router.route_completion(Some(&headers), &body, &body.model), and apply the
same change to the tokenize/detokenize handlers that also use Json<T> so they
likewise return 400 for missing model instead of 422.

In `@model_gateway/tests/api/request_formats_test.rs`:
- Around line 70-99: Add a failing-path assertion that ensures requests without
"model" return the new 400 contract: create a payload that omits "model" (e.g.,
same messages/params but no model field), call
ctx.make_request("/v1/chat/completions", payload).await, assert the result
indicates a failure (or response.status == 400) and that the error body matches
the expected shape (e.g., contains an "error" field with message/type). Repeat
the same missing-model assertion for the "/v1/completions" test block so both
endpoints validate the new model-required behavior.

In `@model_gateway/tests/spec/rerank.rs`:
- Around line 372-383: The test test_v1_to_rerank_request_conversion is
incorrectly asserting that converting a V1RerankReqInput yields a RerankRequest
with model == ""; instead update the test to assert the conversion is rejected
because RerankRequest now requires a model. Replace the direct Into conversion
with a fallible conversion (use TryFrom/TryInto or
RerankRequest::try_from(V1RerankReqInput)) and assert that the result is an Err
(or that conversion panics if the codebase uses panics for invalid input), and
remove any assertion expecting an empty model; keep checks for query/documents
if relevant to the Err case.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: aefc98e3-9162-43d8-a13e-ad2bd876d051

📥 Commits

Reviewing files that changed from the base of the PR and between fd577d8 and 96965b4.

📒 Files selected for processing (33)
  • crates/protocols/src/common.rs
  • crates/protocols/src/generate.rs
  • model_gateway/benches/request_processing.rs
  • model_gateway/benches/wasm_middleware_latency.rs
  • model_gateway/src/routers/grpc/common/stages/dispatch_metadata.rs
  • model_gateway/src/routers/grpc/common/stages/worker_selection.rs
  • model_gateway/src/routers/grpc/context.rs
  • model_gateway/src/routers/grpc/pd_router.rs
  • model_gateway/src/routers/grpc/pipeline.rs
  • model_gateway/src/routers/grpc/regular/responses/common.rs
  • model_gateway/src/routers/grpc/regular/responses/handlers.rs
  • model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs
  • model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs
  • model_gateway/src/routers/grpc/router.rs
  • model_gateway/src/routers/grpc/utils/chat_utils.rs
  • model_gateway/src/routers/http/pd_router.rs
  • model_gateway/src/routers/http/router.rs
  • model_gateway/src/routers/mod.rs
  • model_gateway/src/routers/openai/chat.rs
  • model_gateway/src/routers/openai/responses/route.rs
  • model_gateway/src/routers/openai/router.rs
  • model_gateway/src/routers/router_manager.rs
  • model_gateway/src/server.rs
  • model_gateway/tests/api/api_endpoints_test.rs
  • model_gateway/tests/api/request_formats_test.rs
  • model_gateway/tests/api/responses_api_test.rs
  • model_gateway/tests/api/streaming_tests.rs
  • model_gateway/tests/common/mock_worker.rs
  • model_gateway/tests/otel_tracing_test.rs
  • model_gateway/tests/routing/header_forwarding_test.rs
  • model_gateway/tests/routing/load_balancing_test.rs
  • model_gateway/tests/routing/test_openai_routing.rs
  • model_gateway/tests/spec/rerank.rs

Comment on lines 82 to 118
let model_id = ctx.input.model_id.as_str();
let workers = match self.mode {
WorkerSelectionMode::Regular => {
match self.select_single_worker(
ctx.input.model_id.as_deref(),
text,
tokens,
headers,
) {
match self.select_single_worker(model_id, text, tokens, headers) {
Some(w) => WorkerSelection::Single { worker: w },
None => {
let model = ctx.input.model_id.as_deref().unwrap_or(UNKNOWN_MODEL_ID);
error!(
function = "WorkerSelectionStage::execute",
mode = "Regular",
model_id = %model,
model_id = %model_id,
"No available workers for model"
);
return Err(error::service_unavailable(
"no_available_workers",
format!("No available workers for model: {model}"),
format!("No available workers for model: {model_id}"),
));
}
}
}
WorkerSelectionMode::PrefillDecode => {
match self.select_pd_pair(ctx.input.model_id.as_deref(), text, tokens, headers) {
match self.select_pd_pair(model_id, text, tokens, headers) {
Some((prefill, decode, runtime_type)) => WorkerSelection::Dual {
prefill,
decode,
runtime_type,
},
None => {
let model = ctx.input.model_id.as_deref().unwrap_or(UNKNOWN_MODEL_ID);
error!(
function = "WorkerSelectionStage::execute",
mode = "PrefillDecode",
model_id = %model,
model_id = %model_id,
"No available PD worker pairs for model"
);
return Err(error::service_unavailable(
"no_available_pd_worker_pairs",
format!("No available PD worker pairs for model: {model}"),
format!("No available PD worker pairs for model: {model_id}"),
));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Split unsupported-model and unavailable-worker failures.

select_single_worker() / select_pd_pair() both collapse "no worker advertises this model" and "matching workers exist but none are currently usable" into None, and execute() always maps that to 503. Now that model_id is required, the first case should surface as 404, otherwise unsupported or typoed models look like transient outages. Based on learnings, any_external_worker_supports_model intentionally uses healthy_only = true so 503 is reserved for circuit-broken healthy workers, while absent models should fall through to 404.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs` around
lines 82 - 118, When select_single_worker() or select_pd_pair() returns None in
WorkerSelectionStage::execute, distinguish "model not advertised by any worker"
from "workers advertise model but none usable": call
any_external_worker_supports_model(model_id, headers, healthy_only = false) to
detect if any worker advertises the model at all; if that returns false return a
404 ("unsupported_model") otherwise return the existing 503
("no_available_workers"/"no_available_pd_worker_pairs"). Update the None
branches in the Regular and PrefillDecode arms to perform this check and map to
404 vs 503 accordingly, referencing select_single_worker, select_pd_pair,
any_external_worker_supports_model and WorkerSelectionStage::execute.

Comment on lines 288 to 292
// 0. Fast worker validation (fail-fast before expensive operations)
let requested_model: Option<&str> = model_id.or(Some(body.model.as_str()));

if let Some(error_response) = requested_model
.and_then(|model| validate_worker_availability(&self.worker_registry, model))
if let Some(error_response) = validate_worker_availability(&self.worker_registry, model_id)
{
return error_response;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This early miss path still reports an unsupported model as 503.

validate_worker_availability() returns service_unavailable() when the model is absent, so /v1/responses on the gRPC router still treats a typoed or unsupported model as a transient outage. After making model mandatory, this short-circuit needs to return model_not_found/404 instead, either by remapping the helper result here or by teaching the helper to emit the right status.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/router.rs` around lines 288 - 292, The early
return in the gRPC router uses
validate_worker_availability(&self.worker_registry, model_id) which currently
emits service_unavailable() for missing/unsupported models; change the behavior
so a missing model returns model_not_found() (404) instead of 503. Either update
validate_worker_availability to detect the "model absent" case and return
model_not_found() there, or remap the result at the call site in router.rs
(where validate_worker_availability is invoked) to translate the specific "model
absent" error into model_not_found() while leaving other transient errors as
service_unavailable(); use the existing helper names
validate_worker_availability, service_unavailable, and model_not_found to locate
and implement the change.

Comment thread model_gateway/src/routers/http/pd_router.rs
Comment on lines +281 to +300
// Distinguish "no workers for this model" from "workers exist but unavailable"
let model_filter = if model_id == crate::core::UNKNOWN_MODEL_ID {
None
} else {
Some(model_id)
};
let total = self.worker_registry.get_workers_filtered(
model_filter,
Some(WorkerType::Regular),
Some(ConnectionMode::Http),
None,
false,
);
return if total.is_empty() {
error::model_not_found(model_id)
} else {
error::service_unavailable(
"no_available_workers",
"All workers are unavailable (circuit breaker open or unhealthy)",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a healthy-only existence check before returning 503.

get_workers_filtered(..., false) counts workers that are failing health checks, so a model with only stale/unhealthy registrations now goes down the 503 branch. That collapses “all healthy workers are circuit-broken” and “only unhealthy matches exist” into the same outcome and changes the 404/503 split.

Suggested change
-                let total = self.worker_registry.get_workers_filtered(
+                let healthy = self.worker_registry.get_workers_filtered(
                     model_filter,
                     Some(WorkerType::Regular),
                     Some(ConnectionMode::Http),
                     None,
-                    false,
+                    true,
                 );
-                return if total.is_empty() {
+                return if healthy.is_empty() {
                     error::model_not_found(model_id)
                 } else {

Based on learnings, any_external_worker_supports_model intentionally uses healthy_only = true so 503 is reserved for circuit-broken healthy workers while unhealthy-only matches should fall through to 404.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/http/router.rs` around lines 281 - 300, The current
check calls self.worker_registry.get_workers_filtered(model_filter,
Some(WorkerType::Regular), Some(ConnectionMode::Http), None, false) which counts
unhealthy/stale workers and causes unhealthy-only matches to trigger a 503;
change the call to use healthy_only = true (match how
any_external_worker_supports_model does) so you only treat the absence of
healthy workers as service_unavailable while leaving unhealthy-only matches to
fall through to model_not_found/404.

Comment thread model_gateway/src/server.rs Outdated
Comment on lines +90 to +92
.route("/server_info", get(server_info_handler))
.route("/get_server_info", get(server_info_handler))
.route("/model_info", get(model_info_handler))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

for f in model_gateway/tests/common/mock_worker.rs model_gateway/tests/common/tls_mock_worker.rs; do
  echo "=== $f ==="
  rg -n -C2 'route\("/server_info"|route\("/model_info"|route\("/get_server_info"|route\("/get_model_info"|served_model_name|model_path|should_fail\(&config\)' "$f"
done

Repository: lightseekorg/smg

Length of output: 3927


Keep the TLS mock's discovery contract in sync.

The plain mock exposes /server_info, /get_server_info, /model_info, and /get_model_info endpoints (lines 90–93), with server_info_handler returning a payload that includes served_model_name (line 237). The TLS mock only exposes /get_server_info and its server_info_handler returns a different payload structure omitting served_model_name. This route and payload divergence means TLS-path tests exercise a stale worker-discovery contract and can hide regressions in model matching or registration logic.

Align model_gateway/tests/common/tls_mock_worker.rs to expose the same metadata endpoints (/server_info, /model_info, /get_model_info) and return matching server_info payloads with served_model_name.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/tests/common/mock_worker.rs` around lines 90 - 92, The TLS
mock's route set and server_info payload diverge from the plain mock; update
model_gateway/tests/common/tls_mock_worker.rs so its router registers the same
endpoints as the plain mock (add routes for "/server_info", "/model_info", and
"/get_model_info" alongside "/get_server_info") and make its server_info_handler
return the same JSON structure including the served_model_name field (match the
payload shape produced by the plain mock's server_info_handler) so TLS-path
tests use the identical discovery contract.

Comment on lines +319 to 323
// Should return error when no workers available
assert!(
resp.status() == StatusCode::SERVICE_UNAVAILABLE
|| resp.status() == StatusCode::INTERNAL_SERVER_ERROR,
"Expected 503 or 500 when no workers available, got {}",
!resp.status().is_success(),
"Expected error status when no workers available, got {}",
resp.status()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don't collapse the no-workers assertion to !is_success().

/generate is still the exception to the required-model change, so this would also pass on unrelated 400/500 regressions and stops checking the intended worker-selection contract for the no-workers path.

🎯 Tighten the assertion back to the expected status
-        assert!(
-            !resp.status().is_success(),
-            "Expected error status when no workers available, got {}",
-            resp.status()
-        );
+        assert_eq!(
+            resp.status(),
+            StatusCode::NOT_FOUND,
+            "Expected 404 when no workers are registered"
+        );

Based on learnings, select_worker_for_model intentionally reserves 503 for the circuit-breaker case and lets no healthy matching workers fall through to 404.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/tests/routing/load_balancing_test.rs` around lines 319 - 323,
The test is too loose using !resp.status().is_success(); tighten it to assert
the exact no-workers status—change the assertion in load_balancing_test.rs to
expect HTTP 404 (StatusCode::NOT_FOUND) for the no healthy-matching-worker path
so it aligns with select_worker_for_model's contract (503 reserved for
circuit-breaker); update the failure message to reflect the expected NOT_FOUND
status and keep the check using resp.status() for clarity.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d13e64998

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

"no_available_workers",
format!("No available workers for model: {model}"),
));
return Err(error::model_not_found(model_id));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return 503 for unavailable gRPC workers

WorkerSelectionStage::execute now unconditionally converts a failed selection into error::model_not_found (404), but selection can fail not only when the model is absent, also when matching workers exist but are temporarily unavailable after is_available() filtering. In those transient cases this incorrectly reports a client/model error and prevents retry handling (which is keyed to 5xx), so recoverable outages fail fast instead of retrying.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
model_gateway/src/routers/grpc/common/stages/worker_selection.rs (1)

85-95: ⚠️ Potential issue | 🟠 Major

Don't collapse wildcard and availability failures into model_not_found.

These branches now always return model_not_found, but select_single_worker() / select_pd_pair() also return None when /generate is using UNKNOWN_MODEL_ID as a wildcard or when matching healthy workers are temporarily unusable. That turns transient selection failures into false 404s and leaks "unknown" back to clients. Split true model misses from wildcard/unavailable-worker failures before choosing the response code.

Based on learnings, any_external_worker_supports_model intentionally uses healthy_only = true so 503 is reserved for healthy workers whose circuit breaker is open, while unhealthy workers should still fall through to 404.

Also applies to: 99-113

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs` around
lines 85 - 95, The current code conflates "model truly not present" with
"selection failed due to wildcard/temporary unavailability" by returning
model_not_found whenever select_single_worker/select_pd_pair return None;
instead, call any_external_worker_supports_model(model_id, healthy_only = true)
(and also with healthy_only = false if needed to detect wildcard
UNKNOWN_MODEL_ID) to distinguish: if any_external_worker_supports_model returns
false then return error::model_not_found(model_id); if it returns true but
select_single_worker or select_pd_pair still returned None then return a 503
service/unavailable error (e.g., error::service_unavailable or similar) to
represent transient/unusable-worker failures; apply the same change to both the
single-worker branch (where select_single_worker is used) and the PD-pair branch
(where select_pd_pair is used), referencing WorkerSelectionStage::execute,
select_single_worker, select_pd_pair, and any_external_worker_supports_model to
locate the logic to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model_gateway/src/routers/http/pd_router.rs`:
- Around line 726-765: In select_pd_pair: detect when
self.worker_registry.get_by_model(model_id) returns empty for an explicit
(non-wildcard) model_id (i.e. when is_unknown_model is false) and short-circuit
by returning error::model_not_found(model_id) (or the appropriate typed
non-retriable error) instead of continuing to
pick_worker_by_policy_arc()/handle_server_selection_error; apply this check for
both the prefill_workers and decode_workers branches so unsupported explicit
models return a model_not_found error rather than bubbling as a 503.

---

Duplicate comments:
In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs`:
- Around line 85-95: The current code conflates "model truly not present" with
"selection failed due to wildcard/temporary unavailability" by returning
model_not_found whenever select_single_worker/select_pd_pair return None;
instead, call any_external_worker_supports_model(model_id, healthy_only = true)
(and also with healthy_only = false if needed to detect wildcard
UNKNOWN_MODEL_ID) to distinguish: if any_external_worker_supports_model returns
false then return error::model_not_found(model_id); if it returns true but
select_single_worker or select_pd_pair still returned None then return a 503
service/unavailable error (e.g., error::service_unavailable or similar) to
represent transient/unusable-worker failures; apply the same change to both the
single-worker branch (where select_single_worker is used) and the PD-pair branch
(where select_pd_pair is used), referencing WorkerSelectionStage::execute,
select_single_worker, select_pd_pair, and any_external_worker_supports_model to
locate the logic to change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6e733463-3e57-412e-9a81-3e1a3cd1f13b

📥 Commits

Reviewing files that changed from the base of the PR and between 96965b4 and 4d13e64.

📒 Files selected for processing (7)
  • crates/protocols/src/tokenize.rs
  • model_gateway/src/routers/grpc/common/responses/utils.rs
  • model_gateway/src/routers/grpc/common/stages/worker_selection.rs
  • model_gateway/src/routers/http/pd_router.rs
  • model_gateway/src/server.rs
  • model_gateway/tests/api/api_endpoints_test.rs
  • model_gateway/tests/common/tls_mock_worker.rs

Comment on lines 726 to +765
async fn select_pd_pair(
&self,
request_text: Option<&str>,
model_id: Option<&str>,
model_id: &str,
headers: Option<&HeaderMap>,
) -> Result<(Arc<dyn Worker>, Arc<dyn Worker>), String> {
let effective_model_id = if self.enable_igw { model_id } else { None };
debug!("Selecting PD pair: model_id={:?}", model_id);

debug!(
"Selecting PD pair: enable_igw={}, model_id={:?}, effective_model_id={:?}",
self.enable_igw, model_id, effective_model_id
);
let is_unknown_model = model_id == UNKNOWN_MODEL_ID;

let prefill_workers = if let Some(model) = effective_model_id {
self.worker_registry
.get_by_model(model)
let prefill_workers = {
let by_model: Vec<_> = self
.worker_registry
.get_by_model(model_id)
.iter()
.filter(|w| matches!(w.worker_type(), WorkerType::Prefill))
.cloned()
.collect()
} else {
self.worker_registry.get_prefill_workers()
.collect();
if by_model.is_empty() && is_unknown_model {
// Only fall back to all workers when model is "unknown" (wildcard)
self.worker_registry.get_prefill_workers()
} else {
by_model
}
};

let decode_workers = if let Some(model) = effective_model_id {
self.worker_registry
.get_by_model(model)
let decode_workers = {
let by_model: Vec<_> = self
.worker_registry
.get_by_model(model_id)
.iter()
.filter(|w| matches!(w.worker_type(), WorkerType::Decode))
.cloned()
.collect()
} else {
self.worker_registry.get_decode_workers()
.collect();
if by_model.is_empty() && is_unknown_model {
// Only fall back to all workers when model is "unknown" (wildcard)
self.worker_registry.get_decode_workers()
} else {
by_model
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Surface explicit PD model misses as model_not_found.

The new explicit-model filtering is good, but select_pd_pair() still reports failures as Err(String). When get_by_model(model_id) is empty for an explicit model, that bubbles out through handle_server_selection_error() as 503 server_selection_failed, so unsupported models are retried like transient outages. Short-circuit explicit misses as error::model_not_found(model_id) (or another typed non-retriable error) before falling into pick_worker_by_policy_arc().

Based on learnings, 503 is reserved for healthy workers whose circuit breaker is open, while unsupported models should fall through to 404.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/http/pd_router.rs` around lines 726 - 765, In
select_pd_pair: detect when self.worker_registry.get_by_model(model_id) returns
empty for an explicit (non-wildcard) model_id (i.e. when is_unknown_model is
false) and short-circuit by returning error::model_not_found(model_id) (or the
appropriate typed non-retriable error) instead of continuing to
pick_worker_by_policy_arc()/handle_server_selection_error; apply this check for
both the prefill_workers and decode_workers branches so unsupported explicit
models return a model_not_found error rather than bubbling as a 503.

@slin1237 slin1237 force-pushed the refactor/require-model-field branch from 4d13e64 to 59b4dc8 Compare March 23, 2026 00:04
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 59b4dc8d03

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 93 to +94
if !available_models.contains(&model.to_string()) {
return Some(error::service_unavailable(
"no_available_workers",
format!(
"No workers available for model '{}'. Available models: {}",
model,
available_models.join(", ")
),
));
return Some(error::model_not_found(model));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return 503 when responses workers are temporarily absent

validate_worker_availability now maps any missing model entry in the live registry to model_not_found (404), but this check runs before dispatch and cannot distinguish a true bad model from transient capacity loss (e.g., rolling restart, delayed registration, or full drain). In those outage windows /v1/responses will return a client error instead of a retriable 5xx, causing callers and gateways to stop retrying despite the model being valid.

Useful? React with 👍 / 👎.

Comment on lines +294 to +296
return if total.is_empty() {
error::model_not_found(model_id)
} else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid model-not-found for zero-worker HTTP outages

When worker selection fails, the new branch returns model_not_found if the filtered worker list is empty. This makes transient zero-worker conditions (startup race, discovery blip, or scale-to-zero window) look like a permanent client/model error, so clients that retry only on 5xx may fail fast instead of recovering once workers re-register.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
model_gateway/src/routers/http/router.rs (1)

281-301: ⚠️ Potential issue | 🟠 Major

Preserve the 404/503 split here, and don't surface the "unknown" sentinel.

This fallback still counts unhealthy/stale registrations, so a model that only exists on failed health checks can return 503 instead of the intended 404. Because UNKNOWN_MODEL_ID shares the same path, /generate without a model can also fall through to model_not_found("unknown"), which leaks an internal sentinel to clients.

Suggested fix
-                let total = self.worker_registry.get_workers_filtered(
+                let healthy = self.worker_registry.get_workers_filtered(
                     model_filter,
                     Some(WorkerType::Regular),
                     Some(ConnectionMode::Http),
                     None,
-                    false,
+                    true,
+                );
+                let total = self.worker_registry.get_workers_filtered(
+                    model_filter,
+                    Some(WorkerType::Regular),
+                    Some(ConnectionMode::Http),
+                    None,
+                    false,
                 );
-                return if total.is_empty() {
-                    error::model_not_found(model_id)
+                return if model_id == crate::core::UNKNOWN_MODEL_ID {
+                    if total.is_empty() {
+                        error::service_unavailable("no_workers", "No available workers")
+                    } else {
+                        error::service_unavailable(
+                            "no_available_workers",
+                            "All workers are unavailable",
+                        )
+                    }
+                } else if healthy.is_empty() {
+                    error::model_not_found(model_id)
                 } else {
                     error::service_unavailable(
                         "no_available_workers",
-                        "All workers are unavailable (circuit breaker open or unhealthy)",
+                        "All healthy workers are unavailable",
                     )
                 };

Based on learnings, any_external_worker_supports_model intentionally uses healthy_only = true so 503 is reserved for circuit-broken healthy workers while unhealthy-only matches should fall through to 404.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/http/router.rs` around lines 281 - 301, The code
currently treats UNKNOWN_MODEL_ID as a filter and counts unhealthy/stale workers
(healthy_only = false), which leaks the "unknown" sentinel and can return 503
for models that only have unhealthy registrations; fix by early-returning a 404
when model_id == crate::core::UNKNOWN_MODEL_ID (do not continue to the worker
lookup) and when querying workers use
self.worker_registry.get_workers_filtered(..., true) i.e. set the healthy_only
flag to true so the 404/503 split is preserved; refer to model_id,
UNKNOWN_MODEL_ID, get_workers_filtered, and any_external_worker_supports_model
to locate the logic to change.
model_gateway/src/routers/http/pd_router.rs (1)

736-766: ⚠️ Potential issue | 🟠 Major

Explicit model misses still return 503 instead of 404.

The fallback logic is now correctly restricted to UNKNOWN_MODEL_ID, which addresses the first past review concern. However, when get_by_model(model_id) returns empty for an explicit model, the error path flows through pick_worker_by_policy_archandle_server_selection_error → 503 service_unavailable.

Per retrieved learnings, 503 is reserved for healthy workers whose circuit breaker is open, while unsupported models should return 404. Consider short-circuiting explicit model misses before calling pick_worker_by_policy_arc:

Suggested approach
         let prefill_workers = {
             let by_model: Vec<_> = self
                 .worker_registry
                 .get_by_model(model_id)
                 .iter()
                 .filter(|w| matches!(w.worker_type(), WorkerType::Prefill))
                 .cloned()
                 .collect();
             if by_model.is_empty() && is_unknown_model {
                 // "auto" means pick any — fall back to all prefill workers
                 self.worker_registry.get_prefill_workers()
+            } else if by_model.is_empty() {
+                // Explicit model not found — fail fast with 404-style error
+                return Err(format!("model_not_found:{model_id}"));
             } else {
                 by_model
             }
         };

Then in execute_dual_dispatch, check for the model_not_found: prefix and return error::model_not_found(model_id) instead of calling handle_server_selection_error.

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/http/pd_router.rs` around lines 736 - 766, When an
explicit model lookup yields no workers the code currently falls through to
pick_worker_by_policy_arc and ultimately returns 503; instead short-circuit
explicit misses and return 404. After computing prefill_workers and
decode_workers (the blocks using worker_registry.get_by_model in pd_router.rs)
or at the start of execute_dual_dispatch, add a check: if model_id !=
UNKNOWN_MODEL_ID and (prefill_workers.is_empty() || decode_workers.is_empty())
then immediately return error::model_not_found(model_id) rather than calling
pick_worker_by_policy_arc / handle_server_selection_error; this ensures explicit
model-not-found returns a 404.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs`:
- Line 288: The assignment let model = model_id; in worker_selection.rs is a
redundant rebinding; remove that line and use model_id directly in subsequent
metric calls (the same pattern used in select_single_worker) to avoid the no-op
variable and keep consistency — update references in the surrounding function
(worker selection logic) to use model_id instead of model.

In `@model_gateway/src/server.rs`:
- Around line 227-239: The /v1/rerank compatibility path converts the incoming
body into RerankRequest and only checks model.is_empty(), which skips the full
validation applied elsewhere (e.g., ValidatedJson<RerankRequest>) and emits a
non-standard error shape; instead, take the converted rerank_body and run it
through the same validation/error path used for ValidatedJson (validate fields
like query, documents, top_k, etc.), returning the shared bad-request JSON
structure on failure; locate the conversion of body.into() into rerank_body and
replace the lone model check by invoking the existing RerankRequest validation
routine or the same validator used earlier so all validation errors and shapes
remain consistent across /v1/rerank.

---

Duplicate comments:
In `@model_gateway/src/routers/http/pd_router.rs`:
- Around line 736-766: When an explicit model lookup yields no workers the code
currently falls through to pick_worker_by_policy_arc and ultimately returns 503;
instead short-circuit explicit misses and return 404. After computing
prefill_workers and decode_workers (the blocks using
worker_registry.get_by_model in pd_router.rs) or at the start of
execute_dual_dispatch, add a check: if model_id != UNKNOWN_MODEL_ID and
(prefill_workers.is_empty() || decode_workers.is_empty()) then immediately
return error::model_not_found(model_id) rather than calling
pick_worker_by_policy_arc / handle_server_selection_error; this ensures explicit
model-not-found returns a 404.

In `@model_gateway/src/routers/http/router.rs`:
- Around line 281-301: The code currently treats UNKNOWN_MODEL_ID as a filter
and counts unhealthy/stale workers (healthy_only = false), which leaks the
"unknown" sentinel and can return 503 for models that only have unhealthy
registrations; fix by early-returning a 404 when model_id ==
crate::core::UNKNOWN_MODEL_ID (do not continue to the worker lookup) and when
querying workers use self.worker_registry.get_workers_filtered(..., true) i.e.
set the healthy_only flag to true so the 404/503 split is preserved; refer to
model_id, UNKNOWN_MODEL_ID, get_workers_filtered, and
any_external_worker_supports_model to locate the logic to change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9847e147-6bb2-4a3c-909a-25291993e68a

📥 Commits

Reviewing files that changed from the base of the PR and between 4d13e64 and 59b4dc8.

📒 Files selected for processing (9)
  • crates/protocols/src/common.rs
  • crates/protocols/src/tokenize.rs
  • model_gateway/src/routers/grpc/common/responses/utils.rs
  • model_gateway/src/routers/grpc/common/stages/worker_selection.rs
  • model_gateway/src/routers/http/pd_router.rs
  • model_gateway/src/routers/http/router.rs
  • model_gateway/src/server.rs
  • model_gateway/tests/api/api_endpoints_test.rs
  • model_gateway/tests/common/tls_mock_worker.rs

let decode_idx = policy.select_worker(&available_decode, &info)?;

let model = model_id.unwrap_or(UNKNOWN_MODEL_ID);
let model = model_id;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Minor: Redundant variable rebinding.

let model = model_id; is a no-op rebinding. Consider using model_id directly in the metrics calls below for consistency with select_single_worker (line 181).

♻️ Suggested simplification
-        let model = model_id;
         let policy_name = policy.name();

         // Record worker selection metrics for both prefill and decode
         Metrics::record_worker_selection(
             metrics_labels::WORKER_PREFILL,
             metrics_labels::CONNECTION_GRPC,
-            model,
+            model_id,
             policy_name,
         );
         Metrics::record_worker_selection(
             metrics_labels::WORKER_DECODE,
             metrics_labels::CONNECTION_GRPC,
-            model,
+            model_id,
             policy_name,
         );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let model = model_id;
let policy_name = policy.name();
// Record worker selection metrics for both prefill and decode
Metrics::record_worker_selection(
metrics_labels::WORKER_PREFILL,
metrics_labels::CONNECTION_GRPC,
model_id,
policy_name,
);
Metrics::record_worker_selection(
metrics_labels::WORKER_DECODE,
metrics_labels::CONNECTION_GRPC,
model_id,
policy_name,
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs` at line
288, The assignment let model = model_id; in worker_selection.rs is a redundant
rebinding; remove that line and use model_id directly in subsequent metric calls
(the same pattern used in select_single_worker) to avoid the no-op variable and
keep consistency — update references in the surrounding function (worker
selection logic) to use model_id instead of model.

Comment thread model_gateway/src/server.rs Outdated
- gRPC worker_selection: treat UNKNOWN_MODEL_ID as wildcard (any worker),
  return 404 model_not_found instead of 503 when no workers serve model
- gRPC router: validate_worker_availability returns 404 for unsupported model
- PD router: don't fall back to all workers for explicit model — only
  UNKNOWN_MODEL_ID triggers cross-model fallback
- V1 rerank: return 400 "model_required" instead of silently picking
  first available model
- HTTP router: distinguish 404 (model not found) from 503 (workers exist
  but all unavailable) using same UNKNOWN_MODEL_ID wildcard pattern
- tokenize: add test_detokenize_request_requires_model
- TLS mock worker: add served_model_name and /server_info route alias

Signed-off-by: Simon Lin <simonslin@gmail.com>
Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
@slin1237 slin1237 force-pushed the refactor/require-model-field branch from 59b4dc8 to 8de21dc Compare March 23, 2026 00:17
@slin1237 slin1237 merged commit 7c518b2 into main Mar 23, 2026
37 checks passed
@slin1237 slin1237 deleted the refactor/require-model-field branch March 23, 2026 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmarks Benchmark changes grpc gRPC client and router changes model-gateway Model gateway crate changes openai OpenAI router changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants