refactor: make model field required, remove resolve_model_id by CatherineSue · Pull Request #713 · lightseekorg/smg

CatherineSue · 2026-03-10T21:38:02Z

Description

Problem

The model field handling across SMG is broken in several ways:

default_model() masks missing models — ChatCompletionRequest, ResponsesRequest, and RerankRequest default model to "unknown" via serde when clients omit it. This silently forwards "unknown" to backends, which either reject it (vLLM, TRT-LLM) or ignore it (SGLang), producing confusing errors instead of a clean 400.
resolve_model_id is dead code — Only called for route_generate, route_chat, route_completion, and route_messages in IGW mode. Since serde defaults model to "unknown", model is never None, so the auto-selection logic never executes.
Inconsistent across endpoints — Some route methods call resolve_model_id with if self.enable_igw branching; others (route_responses, route_interactions, route_realtime_*) don't. No principled reason for the difference.

Solution

Remove serde default_model() annotations — model becomes a required JSON field. Missing model → serde deserialization error (400).
Remove resolve_model_id from RouterManager entirely.
Simplify all route methods — remove if self.enable_igw branching, pass model through directly.
Keep UNKNOWN_MODEL_ID constant only for metrics/hash ring fallback in genuinely optional cases (GenerateRequest, gRPC pipeline).

Changes

Protocol layer: Remove default_model(), default_model_name(), and their serde annotations from ChatCompletionRequest, ResponsesRequest, RerankRequest, TokenizeRequest, DetokenizeRequest. Fix InteractionsRequest Default impl to use model: None.
RouterManager: Delete resolve_model_id method (~48 lines). Simplify route_generate, route_chat, route_completion, route_messages — remove if self.enable_igw branching (~114 lines deleted total).
Tests: Update tokenize tests for required model field.

Breaking Changes

Clients that omit the model field from /v1/chat/completions, /v1/completions, /v1/responses, /v1/rerank, /v1/tokenize, or /v1/detokenize will now receive a 400 Bad Request instead of having "unknown" silently injected.

In practice, the only affected setup is non-IGW + HTTP + SGLang/vLLM:

gRPC routers — already effectively required model for tokenizer lookup (resolve_tokenizer fails without a valid model)
IGW mode — model is needed for router selection (get_router_for_model), so missing model already didn't work
OpenAI/Anthropic/Gemini routers — all upstream APIs require model
Non-IGW + HTTP + SGLang — the only case where omitting model "worked" end-to-end, because non-IGW ignores model for worker selection (homogeneous pool) and SGLang doesn't validate the model field. These clients just need to add "model": "<model-name>" to their requests.

This matches the OpenAI API spec where model is a required field.

Not affected:

/generate — GenerateRequest.model remains Option<String> (SGLang native endpoint)
/v1/interactions — InteractionsRequest.model remains Option<String> (model or agent fallback)
/v1/realtime/* — Already had explicit validation requiring non-empty model

Test Plan

cargo test -p openai-protocol — all 36 tests pass
cargo test -p smg — 408 passed, 4 ignored (1 flaky perf test unrelated to changes)
cargo clippy --all-targets --all-features -- -D warnings — clean

Checklist

cargo +nightly fmt passes
cargo clippy --all-targets --all-features -- -D warnings passes
(Optional) Documentation updated

Summary by CodeRabbit

Breaking Changes
- Model identifier is now required for many endpoints; requests without it will be rejected (400/404 as applicable).
- Implicit/default model fallbacks removed — supply an explicit model in requests.
Improvements
- Routing and worker selection consistently use the provided model, improving metrics and error clarity.
Tests
- Tests and fixtures updated to pass explicit model IDs and reflect stricter validation.
New Features
- Mock worker now exposes stable server/model info endpoints for testing.

gemini-code-assist · 2026-03-10T21:38:13Z

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

coderabbitai · 2026-03-10T21:38:21Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Removed implicit/default model provisioning from protocols and made model identifiers required through the routing stack: protocol request serde defaults removed/adjusted, RequestContext/RequestInput model_id made concrete, many RouterTrait signatures changed from Option<&str> to &str, and worker selection/metrics/tokenizer lookups updated accordingly.

Changes

Cohort / File(s)	Summary
Protocol serde & request defaults `crates/protocols/src/chat.rs`, `crates/protocols/src/rerank.rs`, `crates/protocols/src/responses.rs`, `crates/protocols/src/tokenize.rs`, `crates/protocols/src/generate.rs`, `crates/protocols/src/interactions.rs`	Removed serde defaults for `model` (many requests now require `model` or use empty-string defaults); `GenerateRequest.model` uses `default_unknown_model`, other defaults set to `None` or `String::new()`.
Protocol helper rename/export `crates/protocols/src/common.rs`	Removed crate-private `default_model`; added exported `default_unknown_model()` returning the unknown sentinel.
Router trait & implementations `model_gateway/src/routers/mod.rs`, `model_gateway/src/routers/router_manager.rs`, `model_gateway/src/routers/openai/...`, `model_gateway/src/routers/grpc/...`, `model_gateway/src/routers/http/...`, `model_gateway/src/routers/pd_router.rs`, `model_gateway/src/routers/grpc/router.rs`	Changed many RouterTrait method signatures to require `model_id: &str`; removed IGW-mode `resolve_model_id` and UNKNOWN_MODEL_ID fallbacks; routers now forward concrete model_id to pipeline/router internals.
Pipeline / context / stages `model_gateway/src/routers/grpc/context.rs`, `model_gateway/src/routers/grpc/pipeline.rs`, `model_gateway/src/routers/grpc/common/stages/...`, `model_gateway/src/routers/grpc/common/responses/utils.rs`, `model_gateway/src/routers/grpc/regular/...`	`RequestInput.model_id` and related call contexts changed from `Option<String>` to `String`; worker selection, tokenizer lookup, and metrics use the concrete model_id; errors now return `model_not_found(model)` when appropriate.
HTTP server handlers & validation `model_gateway/src/server.rs`	Handlers updated to pass `&body.model` directly; `/v1/rerank` now validates `model` is present and returns 400 with `code: "model_required"` if empty.
Benchmarks, mocks & tests `model_gateway/benches/`, `model_gateway/tests/`, `model_gateway/tests/common/mock_worker.rs`, `model_gateway/tests/common/tls_mock_worker.rs`	Bench defaults and tests updated to include explicit model strings (e.g., `"mock-model"` / `"unknown"`); mock worker endpoints/JSON adjusted; tests updated to call routers with concrete model ids.
Worker selection / PD / HTTP selection logic `model_gateway/src/routers/http/router.rs`, `model_gateway/src/routers/http/pd_router.rs`, `model_gateway/src/routers/grpc/common/stages/worker_selection.rs`	Selection functions now accept `model_id: &str`; treat `UNKNOWN_MODEL_ID` as wildcard when required; selection errors distinguish model-not-found vs. all-workers-unavailable.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant API as API Server
    participant RouterMgr as RouterManager
    participant Router as Router (gRPC/HTTP/OpenAI)
    participant Worker as Worker/Model

    Client->>API: POST /v1/... (body with "model")
    API->>RouterMgr: route_*(headers, body, model_id = &body.model)
    RouterMgr->>Router: select_router(model_id)
    Router->>Worker: forward request(model_id, payload)
    Worker-->>Router: response
    Router-->>RouterMgr: routed response
    RouterMgr-->>API: final response
    API-->>Client: HTTP response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

refactor(gateway): extract shared worker selection module #721 — touches router worker-selection and RouterTrait signature changes similar to this PR.
refactor(gateway): split OpenAI router.rs into chat and health modules #726 — overlaps with OpenAI router/chat routing refactors and route_chat signature changes.
feat: add --served-model-name support for model aliasing #521 — related changes to model identifier representation and worker indexing under model names.

Suggested reviewers

key4ng

Poem

🐇 I hopped through code with nimble feet,
Defaults trimmed back — now models must meet.
Strings travel straight from client to core,
Routers take names and stumble no more.
A carrot-hop cheer for the cleaner floor.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'refactor: make model field required, remove resolve_model_id' clearly and concisely describes the main changes: making model fields required and removing resolve_model_id functionality.
Docstring Coverage	✅ Passed	Docstring coverage is 95.92% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor/require-model-field

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0e1f6d9d98

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

crates/protocols/src/responses.rs (1)
787-798: ⚠️ Potential issue | 🟡 Minor

ResponsesRequest::default() creates an empty model that bypasses routing safeguards.

The Default implementation sets model: String::new(), and GenerationRequest::get_model() returns Some("") for it. Validation does not reject empty models. While current production code explicitly constructs or carries model from upstream requests, the Default impl creates a latent path that could silently route with a blank model if future code uses it. Consider removing Default or adding model validation to prevent accidental misuse.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/protocols/src/responses.rs` around lines 787 - 798, The
ResponsesRequest::default() creates an empty model string which can bypass
routing; remove the Default impl for ResponsesRequest so callers must explicitly
set model, or if Default must exist change behavior so empty model is treated as
unset: update GenerationRequest::get_model() to return None when model is an
empty string (i.e., treat "" as no model) and/or change the default to a clearly
invalid sentinel (or make model an Option<String>) so validation rejects an
absent/empty model rather than silently routing with "".
crates/protocols/src/rerank.rs (1)
202-208: ⚠️ Potential issue | 🟠 Major

Empty model field from V1RerankReqInput conversion causes 404 failures in multi-router mode instead of failing fast.

When v1_rerank handler converts V1RerankReqInput to RerankRequest, it sets model: String::new(). In multi-router setups, select_router_for_request passes this empty string to get_router_for_model(""), which returns no workers. The handler then fails with 404 instead of rejecting the request at validation time.

Fix: Add model field to V1RerankReqInput (with appropriate default or validation) or validate RerankRequest.model for non-empty content before routing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/protocols/src/rerank.rs` around lines 202 - 208, The conversion impl
From<V1RerankReqInput> for RerankRequest is setting model to an empty string
(model: String::new()), which leads select_router_for_request ->
get_router_for_model("") to return no workers and a 404; update the conversion
to propagate a model value from V1RerankReqInput (add a model field to
V1RerankReqInput if missing) or, alternatively, add validation in v1_rerank
before routing to ensure RerankRequest.model is non-empty and return a
validation error; touch the From<V1RerankReqInput> for RerankRequest,
V1RerankReqInput struct, and the v1_rerank handler/select_router_for_request
validation paths when applying the fix.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/protocols/src/tokenize.rs`:
- Around line 232-241: Add a negative deserialization test for DetokenizeRequest
similar to test_tokenize_request_requires_model: create a JSON string that omits
the "model" field (e.g. just "tokens": ...) and attempt
serde_json::from_str::<DetokenizeRequest>(); assert that it returns an Err (i.e.
deserialization fails). Place it as a new #[test] fn (e.g.
test_detokenize_request_requires_model) next to
test_detokenize_request_single/batch and reference DetokenizeRequest and
TokensInput to locate the struct to verify the behavior.

In `@model_gateway/src/routers/router_manager.rs`:
- Around line 406-409: The current call to select_router_for_request(headers,
model_id) before invoking router.route_generate(...) allows model-less /generate
requests to be routed arbitrarily; change the logic so /generate preserves
explicit model resolution: if model_id is Some, keep current behavior (call
select_router_for_request and route_generate); if model_id is None then run the
previous “single-model default or ambiguity failure” resolution (i.e., determine
the unique routable model or return an error if multiple models are routable)
and only then call route_generate with that resolved model_id; ensure you update
the branching around select_router_for_request and route_generate to reject
missing model_id when multiple models are routable.

---

Outside diff comments:
In `@crates/protocols/src/rerank.rs`:
- Around line 202-208: The conversion impl From<V1RerankReqInput> for
RerankRequest is setting model to an empty string (model: String::new()), which
leads select_router_for_request -> get_router_for_model("") to return no workers
and a 404; update the conversion to propagate a model value from
V1RerankReqInput (add a model field to V1RerankReqInput if missing) or,
alternatively, add validation in v1_rerank before routing to ensure
RerankRequest.model is non-empty and return a validation error; touch the
From<V1RerankReqInput> for RerankRequest, V1RerankReqInput struct, and the
v1_rerank handler/select_router_for_request validation paths when applying the
fix.

In `@crates/protocols/src/responses.rs`:
- Around line 787-798: The ResponsesRequest::default() creates an empty model
string which can bypass routing; remove the Default impl for ResponsesRequest so
callers must explicitly set model, or if Default must exist change behavior so
empty model is treated as unset: update GenerationRequest::get_model() to return
None when model is an empty string (i.e., treat "" as no model) and/or change
the default to a clearly invalid sentinel (or make model an Option<String>) so
validation rejects an absent/empty model rather than silently routing with "".

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8f2a2cd4-55a2-4b7a-9835-a06421dc8980

📥 Commits

Reviewing files that changed from the base of the PR and between e431a1f and 2c6a5b2.

📒 Files selected for processing (7)

crates/protocols/src/chat.rs
crates/protocols/src/common.rs
crates/protocols/src/interactions.rs
crates/protocols/src/rerank.rs
crates/protocols/src/responses.rs
crates/protocols/src/tokenize.rs
model_gateway/src/routers/router_manager.rs

💤 Files with no reviewable changes (1)

crates/protocols/src/common.rs

Signed-off-by: Chang Su <chang.s.su@oracle.com>

…hods Now that model is a required field in protocol types, the resolve_model_id method is no longer needed. Simplify route_generate, route_chat, route_completion, and route_messages by removing the enable_igw branching and passing model directly from request bodies. Signed-off-by: Chang Su <chang.s.su@oracle.com>

coderabbitai

♻️ Duplicate comments (2)

crates/protocols/src/tokenize.rs (1)
231-245: 🧹 Nitpick | 🔵 Trivial

Add matching negative test for DetokenizeRequest.

The PR updated the detokenize tests to include the model field, but there's no corresponding negative test verifying that DetokenizeRequest also fails deserialization when model is omitted. This would ensure parity with the test_tokenize_request_requires_model test.
Proposed test
+    #[test]
+    fn test_detokenize_request_requires_model() {
+        let json = r#"{"tokens": [1, 2, 3]}"#;
+        let result = serde_json::from_str::<DetokenizeRequest>(json);
+        assert!(result.is_err(), "Should fail without model field");
+    }
+
     #[test]
     fn test_detokenize_request_single() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/protocols/src/tokenize.rs` around lines 231 - 245, Add a negative test
that verifies DetokenizeRequest deserialization fails when the required model
field is omitted: create a new test (e.g.,
test_detokenize_request_requires_model) that tries to deserialize a JSON string
like r#"{"tokens": [1,2,3]}"# (and another case for batch tokens if desired)
into DetokenizeRequest using serde_json::from_str and assert that it returns an
Err; reference the DetokenizeRequest type and TokensInput enum to mirror the
existing positive tests and ensure parity with
test_tokenize_request_requires_model.
model_gateway/src/routers/router_manager.rs (1)
509-526: ⚠️ Potential issue | 🟠 Major

Preserve explicit model resolution for /generate.

/generate is the one model-optional path per PR objectives, but with resolve_model_id removed, select_router_for_request(headers, model_id) with model_id=None will select a router based solely on headers and worker distribution scoring. In IGW mode, this means a request without model_id can now be sent to an arbitrary router instead of following the previous "single model default or ambiguity failure" behavior.

Consider either:

Preserving the old inference logic for this endpoint (if only one model is routable, use it; otherwise fail)

Rejecting missing model_id when multiple models are routable in IGW mode
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/router_manager.rs` around lines 509 - 526,
route_generate currently forwards requests with model_id=None to
select_router_for_request, which allows header-only selection and breaks the
previous explicit-model resolution behavior; change it so that when
model_id.is_none() we first attempt the old resolution logic (call
self.resolve_model_id(headers) or equivalent) and only then call
select_router_for_request with the resolved model id; if resolution yields
multiple routable models in IGW mode return a NOT_FOUND or explicit error
(reject missing model_id), and if resolution yields exactly one use that model
id to pick the router and proceed with router.route_generate(headers, body,
Some(resolved_model_id)).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@crates/protocols/src/tokenize.rs`:
- Around line 231-245: Add a negative test that verifies DetokenizeRequest
deserialization fails when the required model field is omitted: create a new
test (e.g., test_detokenize_request_requires_model) that tries to deserialize a
JSON string like r#"{"tokens": [1,2,3]}"# (and another case for batch tokens if
desired) into DetokenizeRequest using serde_json::from_str and assert that it
returns an Err; reference the DetokenizeRequest type and TokensInput enum to
mirror the existing positive tests and ensure parity with
test_tokenize_request_requires_model.

In `@model_gateway/src/routers/router_manager.rs`:
- Around line 509-526: route_generate currently forwards requests with
model_id=None to select_router_for_request, which allows header-only selection
and breaks the previous explicit-model resolution behavior; change it so that
when model_id.is_none() we first attempt the old resolution logic (call
self.resolve_model_id(headers) or equivalent) and only then call
select_router_for_request with the resolved model id; if resolution yields
multiple routable models in IGW mode return a NOT_FOUND or explicit error
(reject missing model_id), and if resolution yields exactly one use that model
id to pick the router and proceed with router.route_generate(headers, body,
Some(resolved_model_id)).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: dd0c9ef9-ff84-4eab-8b40-ac3a4663ca3e

📥 Commits

Reviewing files that changed from the base of the PR and between 2c6a5b2 and fd577d8.

📒 Files selected for processing (7)

crates/protocols/src/chat.rs
crates/protocols/src/common.rs
crates/protocols/src/interactions.rs
crates/protocols/src/rerank.rs
crates/protocols/src/responses.rs
crates/protocols/src/tokenize.rs
model_gateway/src/routers/router_manager.rs

💤 Files with no reviewable changes (1)

crates/protocols/src/common.rs

… wrappers Cascade the required model field through the entire routing layer: - RouterTrait: all methods now take model_id: &str (no more Option) - GenerateRequest.model: Option<String> → String with serde default "unknown" - HTTP/gRPC routers: remove enable_igw model-filtering guard, always route by model - select_worker_for_model: treats "unknown" as wildcard (any worker) - Pipeline contexts: model_id: Option<String> → String - Worker selection: always filters by model, no UNKNOWN_MODEL_ID fallbacks - V1 rerank: resolves model from available workers when absent Signed-off-by: Simon Lin <simonslin@gmail.com> Signed-off-by: Simo Lin <linsimo.mark@gmail.com>

Update mock worker and test assertions for the required model field: - Mock worker: add served_model_name to server_info, add /server_info and /model_info route aliases, remove should_fail from server_info (metadata endpoints must always succeed for worker registration) - Test payloads: use "mock-model" consistently (matching the mock worker) - Router: distinguish 404 (model not found) from 503 (workers exist but unavailable) when select_worker returns None - GenerateRequest.model defaults to "unknown" via serde; router treats "unknown" as wildcard (any worker) Signed-off-by: Simon Lin <simonslin@gmail.com> Signed-off-by: Simo Lin <linsimo.mark@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 96965b4756

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

model_gateway/tests/api/request_formats_test.rs (1)
70-99: 🧹 Nitpick | 🔵 Trivial

Add a missing-model assertion for the new 400 contract.

Lines 70-99 and Lines 118-151 only verify the happy path with model present. Since this PR changes /v1/chat/completions and /v1/completions from implicit fallback to model-required requests, this file should also pin the failure path; otherwise the old fallback can regress without tripping these smoke tests.

Also applies to: 118-151
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/tests/api/request_formats_test.rs` around lines 70 - 99, Add a
failing-path assertion that ensures requests without "model" return the new 400
contract: create a payload that omits "model" (e.g., same messages/params but no
model field), call ctx.make_request("/v1/chat/completions", payload).await,
assert the result indicates a failure (or response.status == 400) and that the
error body matches the expected shape (e.g., contains an "error" field with
message/type). Repeat the same missing-model assertion for the "/v1/completions"
test block so both endpoints validate the new model-required behavior.
model_gateway/tests/spec/rerank.rs (1)
372-383: ⚠️ Potential issue | 🟠 Major

Don't normalize a model-less V1 rerank request into model == "".

Lines 35-55 now establish that RerankRequest is invalid without a model, but this test still blesses a V1RerankReqInput -> RerankRequest conversion that manufactures an empty model. That recreates the omitted-model fallback this PR is removing and lets the V1 path dodge the new invariant until much later in the flow.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/tests/spec/rerank.rs` around lines 372 - 383, The test
test_v1_to_rerank_request_conversion is incorrectly asserting that converting a
V1RerankReqInput yields a RerankRequest with model == ""; instead update the
test to assert the conversion is rejected because RerankRequest now requires a
model. Replace the direct Into conversion with a fallible conversion (use
TryFrom/TryInto or RerankRequest::try_from(V1RerankReqInput)) and assert that
the result is an Err (or that conversion panics if the codebase uses panics for
invalid input), and remove any assertion expecting an empty model; keep checks
for query/documents if relevant to the Err case.
model_gateway/src/server.rs (1)
200-208: ⚠️ Potential issue | 🟠 Major

Json<CompletionRequest> will return 422 (not 400) for a missing model field.

Axum's Json<T> extractor returns 422 Unprocessable Entity when required fields fail deserialization. If /v1/completions is meant to enforce the 400 status for missing models (as in other handlers), this needs a custom rejection mapping or a validated extractor that explicitly handles the missing model case. The same issue applies to the Json<T> tokenize/detokenize handlers elsewhere in this file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/server.rs` around lines 200 - 208, The v1_completions
handler (async fn v1_completions) currently uses axum's Json<CompletionRequest>
extractor which yields 422 on deserialization failures (e.g., missing model) but
you want a 400 for missing model; change the extractor to a validated/custom
extractor or map Json rejections so that when CompletionRequest deserializes but
the model field is absent you return a 400 before calling
state.router.route_completion(Some(&headers), &body, &body.model), and apply the
same change to the tokenize/detokenize handlers that also use Json<T> so they
likewise return 400 for missing model instead of 422.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs`:
- Around line 82-118: When select_single_worker() or select_pd_pair() returns
None in WorkerSelectionStage::execute, distinguish "model not advertised by any
worker" from "workers advertise model but none usable": call
any_external_worker_supports_model(model_id, headers, healthy_only = false) to
detect if any worker advertises the model at all; if that returns false return a
404 ("unsupported_model") otherwise return the existing 503
("no_available_workers"/"no_available_pd_worker_pairs"). Update the None
branches in the Regular and PrefillDecode arms to perform this check and map to
404 vs 503 accordingly, referencing select_single_worker, select_pd_pair,
any_external_worker_supports_model and WorkerSelectionStage::execute.

In `@model_gateway/src/routers/grpc/router.rs`:
- Around line 288-292: The early return in the gRPC router uses
validate_worker_availability(&self.worker_registry, model_id) which currently
emits service_unavailable() for missing/unsupported models; change the behavior
so a missing model returns model_not_found() (404) instead of 503. Either update
validate_worker_availability to detect the "model absent" case and return
model_not_found() there, or remap the result at the call site in router.rs
(where validate_worker_availability is invoked) to translate the specific "model
absent" error into model_not_found() while leaving other transient errors as
service_unavailable(); use the existing helper names
validate_worker_availability, service_unavailable, and model_not_found to locate
and implement the change.

In `@model_gateway/src/routers/http/pd_router.rs`:
- Around line 734-761: The code currently falls back to global pools when
get_by_model(model_id) is empty, which can route an explicit unsupported model
to unrelated prefill/decode workers; change the logic in the prefill_workers and
decode_workers blocks so that if self.worker_registry.get_by_model(model_id)
returns empty and model_id != UNKNOWN_MODEL_ID you do not call
get_prefill_workers()/get_decode_workers() but instead return an error (or fail
the request) indicating unsupported model; only allow the global fallback when
model_id == UNKNOWN_MODEL_ID. Update the branches around prefill_workers and
decode_workers (and any surrounding control flow that consumes them) to enforce
this check and avoid mixing pools for partially matched models.

In `@model_gateway/src/routers/http/router.rs`:
- Around line 281-300: The current check calls
self.worker_registry.get_workers_filtered(model_filter,
Some(WorkerType::Regular), Some(ConnectionMode::Http), None, false) which counts
unhealthy/stale workers and causes unhealthy-only matches to trigger a 503;
change the call to use healthy_only = true (match how
any_external_worker_supports_model does) so you only treat the absence of
healthy workers as service_unavailable while leaving unhealthy-only matches to
fall through to model_not_found/404.

In `@model_gateway/src/server.rs`:
- Around line 227-234: The code currently fills an empty RerankRequest.model by
taking the first entry from state.context.worker_registry.get_models(); instead,
detect when rerank_body.model.is_empty() and return a 400 Bad Request with a
clear error message (e.g. "model is required for /v1/rerank") rather than
choosing an arbitrary model; update the handler that processes RerankRequest
(the branch that references rerank_body and RerankRequest and calls
state.context.worker_registry.get_models()) to produce the HTTP 400 response
using the same error response mechanism the server uses elsewhere.

In `@model_gateway/tests/common/mock_worker.rs`:
- Around line 90-92: The TLS mock's route set and server_info payload diverge
from the plain mock; update model_gateway/tests/common/tls_mock_worker.rs so its
router registers the same endpoints as the plain mock (add routes for
"/server_info", "/model_info", and "/get_model_info" alongside
"/get_server_info") and make its server_info_handler return the same JSON
structure including the served_model_name field (match the payload shape
produced by the plain mock's server_info_handler) so TLS-path tests use the
identical discovery contract.

In `@model_gateway/tests/routing/load_balancing_test.rs`:
- Around line 319-323: The test is too loose using !resp.status().is_success();
tighten it to assert the exact no-workers status—change the assertion in
load_balancing_test.rs to expect HTTP 404 (StatusCode::NOT_FOUND) for the no
healthy-matching-worker path so it aligns with select_worker_for_model's
contract (503 reserved for circuit-breaker); update the failure message to
reflect the expected NOT_FOUND status and keep the check using resp.status() for
clarity.

---

Outside diff comments:
In `@model_gateway/src/server.rs`:
- Around line 200-208: The v1_completions handler (async fn v1_completions)
currently uses axum's Json<CompletionRequest> extractor which yields 422 on
deserialization failures (e.g., missing model) but you want a 400 for missing
model; change the extractor to a validated/custom extractor or map Json
rejections so that when CompletionRequest deserializes but the model field is
absent you return a 400 before calling
state.router.route_completion(Some(&headers), &body, &body.model), and apply the
same change to the tokenize/detokenize handlers that also use Json<T> so they
likewise return 400 for missing model instead of 422.

In `@model_gateway/tests/api/request_formats_test.rs`:
- Around line 70-99: Add a failing-path assertion that ensures requests without
"model" return the new 400 contract: create a payload that omits "model" (e.g.,
same messages/params but no model field), call
ctx.make_request("/v1/chat/completions", payload).await, assert the result
indicates a failure (or response.status == 400) and that the error body matches
the expected shape (e.g., contains an "error" field with message/type). Repeat
the same missing-model assertion for the "/v1/completions" test block so both
endpoints validate the new model-required behavior.

In `@model_gateway/tests/spec/rerank.rs`:
- Around line 372-383: The test test_v1_to_rerank_request_conversion is
incorrectly asserting that converting a V1RerankReqInput yields a RerankRequest
with model == ""; instead update the test to assert the conversion is rejected
because RerankRequest now requires a model. Replace the direct Into conversion
with a fallible conversion (use TryFrom/TryInto or
RerankRequest::try_from(V1RerankReqInput)) and assert that the result is an Err
(or that conversion panics if the codebase uses panics for invalid input), and
remove any assertion expecting an empty model; keep checks for query/documents
if relevant to the Err case.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: aefc98e3-9162-43d8-a13e-ad2bd876d051

📥 Commits

Reviewing files that changed from the base of the PR and between fd577d8 and 96965b4.

📒 Files selected for processing (33)

crates/protocols/src/common.rs
crates/protocols/src/generate.rs
model_gateway/benches/request_processing.rs
model_gateway/benches/wasm_middleware_latency.rs
model_gateway/src/routers/grpc/common/stages/dispatch_metadata.rs
model_gateway/src/routers/grpc/common/stages/worker_selection.rs
model_gateway/src/routers/grpc/context.rs
model_gateway/src/routers/grpc/pd_router.rs
model_gateway/src/routers/grpc/pipeline.rs
model_gateway/src/routers/grpc/regular/responses/common.rs
model_gateway/src/routers/grpc/regular/responses/handlers.rs
model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs
model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs
model_gateway/src/routers/grpc/router.rs
model_gateway/src/routers/grpc/utils/chat_utils.rs
model_gateway/src/routers/http/pd_router.rs
model_gateway/src/routers/http/router.rs
model_gateway/src/routers/mod.rs
model_gateway/src/routers/openai/chat.rs
model_gateway/src/routers/openai/responses/route.rs
model_gateway/src/routers/openai/router.rs
model_gateway/src/routers/router_manager.rs
model_gateway/src/server.rs
model_gateway/tests/api/api_endpoints_test.rs
model_gateway/tests/api/request_formats_test.rs
model_gateway/tests/api/responses_api_test.rs
model_gateway/tests/api/streaming_tests.rs
model_gateway/tests/common/mock_worker.rs
model_gateway/tests/otel_tracing_test.rs
model_gateway/tests/routing/header_forwarding_test.rs
model_gateway/tests/routing/load_balancing_test.rs
model_gateway/tests/routing/test_openai_routing.rs
model_gateway/tests/spec/rerank.rs

coderabbitai · 2026-03-22T21:35:29Z

+        let model_id = ctx.input.model_id.as_str();
        let workers = match self.mode {
            WorkerSelectionMode::Regular => {
-                match self.select_single_worker(
-                    ctx.input.model_id.as_deref(),
-                    text,
-                    tokens,
-                    headers,
-                ) {
+                match self.select_single_worker(model_id, text, tokens, headers) {
                    Some(w) => WorkerSelection::Single { worker: w },
                    None => {
-                        let model = ctx.input.model_id.as_deref().unwrap_or(UNKNOWN_MODEL_ID);
                        error!(
                            function = "WorkerSelectionStage::execute",
                            mode = "Regular",
-                            model_id = %model,
+                            model_id = %model_id,
                            "No available workers for model"
                        );
                        return Err(error::service_unavailable(
                            "no_available_workers",
-                            format!("No available workers for model: {model}"),
+                            format!("No available workers for model: {model_id}"),
                        ));
                    }
                }
            }
            WorkerSelectionMode::PrefillDecode => {
-                match self.select_pd_pair(ctx.input.model_id.as_deref(), text, tokens, headers) {
+                match self.select_pd_pair(model_id, text, tokens, headers) {
                    Some((prefill, decode, runtime_type)) => WorkerSelection::Dual {
                        prefill,
                        decode,
                        runtime_type,
                    },
                    None => {
-                        let model = ctx.input.model_id.as_deref().unwrap_or(UNKNOWN_MODEL_ID);
                        error!(
                            function = "WorkerSelectionStage::execute",
                            mode = "PrefillDecode",
-                            model_id = %model,
+                            model_id = %model_id,
                            "No available PD worker pairs for model"
                        );
                        return Err(error::service_unavailable(
                            "no_available_pd_worker_pairs",
-                            format!("No available PD worker pairs for model: {model}"),
+                            format!("No available PD worker pairs for model: {model_id}"),
                        ));


⚠️ Potential issue | 🟠 Major

Split unsupported-model and unavailable-worker failures.

select_single_worker() / select_pd_pair() both collapse "no worker advertises this model" and "matching workers exist but none are currently usable" into None, and execute() always maps that to 503. Now that model_id is required, the first case should surface as 404, otherwise unsupported or typoed models look like transient outages. Based on learnings, any_external_worker_supports_model intentionally uses healthy_only = true so 503 is reserved for circuit-broken healthy workers, while absent models should fall through to 404.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs` around lines 82 - 118, When select_single_worker() or select_pd_pair() returns None in WorkerSelectionStage::execute, distinguish "model not advertised by any worker" from "workers advertise model but none usable": call any_external_worker_supports_model(model_id, headers, healthy_only = false) to detect if any worker advertises the model at all; if that returns false return a 404 ("unsupported_model") otherwise return the existing 503 ("no_available_workers"/"no_available_pd_worker_pairs"). Update the None branches in the Regular and PrefillDecode arms to perform this check and map to 404 vs 503 accordingly, referencing select_single_worker, select_pd_pair, any_external_worker_supports_model and WorkerSelectionStage::execute.

coderabbitai · 2026-03-22T21:35:29Z

        // 0. Fast worker validation (fail-fast before expensive operations)
-        let requested_model: Option<&str> = model_id.or(Some(body.model.as_str()));
-
-        if let Some(error_response) = requested_model
-            .and_then(|model| validate_worker_availability(&self.worker_registry, model))
+        if let Some(error_response) = validate_worker_availability(&self.worker_registry, model_id)
        {
            return error_response;
        }


⚠️ Potential issue | 🟠 Major

This early miss path still reports an unsupported model as 503.

validate_worker_availability() returns service_unavailable() when the model is absent, so /v1/responses on the gRPC router still treats a typoed or unsupported model as a transient outage. After making model mandatory, this short-circuit needs to return model_not_found/404 instead, either by remapping the helper result here or by teaching the helper to emit the right status.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/router.rs` around lines 288 - 292, The early return in the gRPC router uses validate_worker_availability(&self.worker_registry, model_id) which currently emits service_unavailable() for missing/unsupported models; change the behavior so a missing model returns model_not_found() (404) instead of 503. Either update validate_worker_availability to detect the "model absent" case and return model_not_found() there, or remap the result at the call site in router.rs (where validate_worker_availability is invoked) to translate the specific "model absent" error into model_not_found() while leaving other transient errors as service_unavailable(); use the existing helper names validate_worker_availability, service_unavailable, and model_not_found to locate and implement the change.

coderabbitai · 2026-03-22T21:35:29Z

+                // Distinguish "no workers for this model" from "workers exist but unavailable"
+                let model_filter = if model_id == crate::core::UNKNOWN_MODEL_ID {
+                    None
+                } else {
+                    Some(model_id)
+                };
+                let total = self.worker_registry.get_workers_filtered(
+                    model_filter,
+                    Some(WorkerType::Regular),
+                    Some(ConnectionMode::Http),
+                    None,
+                    false,
                );
+                return if total.is_empty() {
+                    error::model_not_found(model_id)
+                } else {
+                    error::service_unavailable(
+                        "no_available_workers",
+                        "All workers are unavailable (circuit breaker open or unhealthy)",
+                    )


⚠️ Potential issue | 🟠 Major

Use a healthy-only existence check before returning 503.

get_workers_filtered(..., false) counts workers that are failing health checks, so a model with only stale/unhealthy registrations now goes down the 503 branch. That collapses “all healthy workers are circuit-broken” and “only unhealthy matches exist” into the same outcome and changes the 404/503 split.

Suggested change

- let total = self.worker_registry.get_workers_filtered( + let healthy = self.worker_registry.get_workers_filtered( model_filter, Some(WorkerType::Regular), Some(ConnectionMode::Http), None, - false, + true, ); - return if total.is_empty() { + return if healthy.is_empty() { error::model_not_found(model_id) } else {

Based on learnings, any_external_worker_supports_model intentionally uses healthy_only = true so 503 is reserved for circuit-broken healthy workers while unhealthy-only matches should fall through to 404.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/http/router.rs` around lines 281 - 300, The current check calls self.worker_registry.get_workers_filtered(model_filter, Some(WorkerType::Regular), Some(ConnectionMode::Http), None, false) which counts unhealthy/stale workers and causes unhealthy-only matches to trigger a 503; change the call to use healthy_only = true (match how any_external_worker_supports_model does) so you only treat the absence of healthy workers as service_unavailable while leaving unhealthy-only matches to fall through to model_not_found/404.

coderabbitai · 2026-03-22T21:35:29Z

+            .route("/server_info", get(server_info_handler))
            .route("/get_server_info", get(server_info_handler))
+            .route("/model_info", get(model_info_handler))


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail for f in model_gateway/tests/common/mock_worker.rs model_gateway/tests/common/tls_mock_worker.rs; do echo "=== $f ===" rg -n -C2 'route$"/server_info"|route\("/model_info"|route\("/get_server_info"|route\("/get_model_info"|served_model_name|model_path|should_fail\(&config$' "$f" done

Repository: lightseekorg/smg

Length of output: 3927

Keep the TLS mock's discovery contract in sync.

The plain mock exposes /server_info, /get_server_info, /model_info, and /get_model_info endpoints (lines 90–93), with server_info_handler returning a payload that includes served_model_name (line 237). The TLS mock only exposes /get_server_info and its server_info_handler returns a different payload structure omitting served_model_name. This route and payload divergence means TLS-path tests exercise a stale worker-discovery contract and can hide regressions in model matching or registration logic.

Align model_gateway/tests/common/tls_mock_worker.rs to expose the same metadata endpoints (/server_info, /model_info, /get_model_info) and return matching server_info payloads with served_model_name.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@model_gateway/tests/common/mock_worker.rs` around lines 90 - 92, The TLS mock's route set and server_info payload diverge from the plain mock; update model_gateway/tests/common/tls_mock_worker.rs so its router registers the same endpoints as the plain mock (add routes for "/server_info", "/model_info", and "/get_model_info" alongside "/get_server_info") and make its server_info_handler return the same JSON structure including the served_model_name field (match the payload shape produced by the plain mock's server_info_handler) so TLS-path tests use the identical discovery contract.

coderabbitai · 2026-03-22T21:35:29Z

+        // Should return error when no workers available
        assert!(
-            resp.status() == StatusCode::SERVICE_UNAVAILABLE
-                || resp.status() == StatusCode::INTERNAL_SERVER_ERROR,
-            "Expected 503 or 500 when no workers available, got {}",
+            !resp.status().is_success(),
+            "Expected error status when no workers available, got {}",
            resp.status()


⚠️ Potential issue | 🟡 Minor

Don't collapse the no-workers assertion to !is_success().

/generate is still the exception to the required-model change, so this would also pass on unrelated 400/500 regressions and stops checking the intended worker-selection contract for the no-workers path.

🎯 Tighten the assertion back to the expected status

- assert!( - !resp.status().is_success(), - "Expected error status when no workers available, got {}", - resp.status() - ); + assert_eq!( + resp.status(), + StatusCode::NOT_FOUND, + "Expected 404 when no workers are registered" + );

Based on learnings, select_worker_for_model intentionally reserves 503 for the circuit-breaker case and lets no healthy matching workers fall through to 404.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@model_gateway/tests/routing/load_balancing_test.rs` around lines 319 - 323, The test is too loose using !resp.status().is_success(); tighten it to assert the exact no-workers status—change the assertion in load_balancing_test.rs to expect HTTP 404 (StatusCode::NOT_FOUND) for the no healthy-matching-worker path so it aligns with select_worker_for_model's contract (503 reserved for circuit-breaker); update the failure message to reflect the expected NOT_FOUND status and keep the check using resp.status() for clarity.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d13e64998

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-22T22:09:29Z

-                            "no_available_workers",
-                            format!("No available workers for model: {model}"),
-                        ));
+                        return Err(error::model_not_found(model_id));


Return 503 for unavailable gRPC workers

WorkerSelectionStage::execute now unconditionally converts a failed selection into error::model_not_found (404), but selection can fail not only when the model is absent, also when matching workers exist but are temporarily unavailable after is_available() filtering. In those transient cases this incorrectly reports a client/model error and prevents retry handling (which is keyed to 5xx), so recoverable outages fail fast instead of retrying.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

model_gateway/src/routers/grpc/common/stages/worker_selection.rs (1)
85-95: ⚠️ Potential issue | 🟠 Major

Don't collapse wildcard and availability failures into model_not_found.

These branches now always return model_not_found, but select_single_worker() / select_pd_pair() also return None when /generate is using UNKNOWN_MODEL_ID as a wildcard or when matching healthy workers are temporarily unusable. That turns transient selection failures into false 404s and leaks "unknown" back to clients. Split true model misses from wildcard/unavailable-worker failures before choosing the response code.

Based on learnings, any_external_worker_supports_model intentionally uses healthy_only = true so 503 is reserved for healthy workers whose circuit breaker is open, while unhealthy workers should still fall through to 404.

Also applies to: 99-113
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs` around
lines 85 - 95, The current code conflates "model truly not present" with
"selection failed due to wildcard/temporary unavailability" by returning
model_not_found whenever select_single_worker/select_pd_pair return None;
instead, call any_external_worker_supports_model(model_id, healthy_only = true)
(and also with healthy_only = false if needed to detect wildcard
UNKNOWN_MODEL_ID) to distinguish: if any_external_worker_supports_model returns
false then return error::model_not_found(model_id); if it returns true but
select_single_worker or select_pd_pair still returned None then return a 503
service/unavailable error (e.g., error::service_unavailable or similar) to
represent transient/unusable-worker failures; apply the same change to both the
single-worker branch (where select_single_worker is used) and the PD-pair branch
(where select_pd_pair is used), referencing WorkerSelectionStage::execute,
select_single_worker, select_pd_pair, and any_external_worker_supports_model to
locate the logic to change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model_gateway/src/routers/http/pd_router.rs`:
- Around line 726-765: In select_pd_pair: detect when
self.worker_registry.get_by_model(model_id) returns empty for an explicit
(non-wildcard) model_id (i.e. when is_unknown_model is false) and short-circuit
by returning error::model_not_found(model_id) (or the appropriate typed
non-retriable error) instead of continuing to
pick_worker_by_policy_arc()/handle_server_selection_error; apply this check for
both the prefill_workers and decode_workers branches so unsupported explicit
models return a model_not_found error rather than bubbling as a 503.

---

Duplicate comments:
In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs`:
- Around line 85-95: The current code conflates "model truly not present" with
"selection failed due to wildcard/temporary unavailability" by returning
model_not_found whenever select_single_worker/select_pd_pair return None;
instead, call any_external_worker_supports_model(model_id, healthy_only = true)
(and also with healthy_only = false if needed to detect wildcard
UNKNOWN_MODEL_ID) to distinguish: if any_external_worker_supports_model returns
false then return error::model_not_found(model_id); if it returns true but
select_single_worker or select_pd_pair still returned None then return a 503
service/unavailable error (e.g., error::service_unavailable or similar) to
represent transient/unusable-worker failures; apply the same change to both the
single-worker branch (where select_single_worker is used) and the PD-pair branch
(where select_pd_pair is used), referencing WorkerSelectionStage::execute,
select_single_worker, select_pd_pair, and any_external_worker_supports_model to
locate the logic to change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6e733463-3e57-412e-9a81-3e1a3cd1f13b

📥 Commits

Reviewing files that changed from the base of the PR and between 96965b4 and 4d13e64.

📒 Files selected for processing (7)

crates/protocols/src/tokenize.rs
model_gateway/src/routers/grpc/common/responses/utils.rs
model_gateway/src/routers/grpc/common/stages/worker_selection.rs
model_gateway/src/routers/http/pd_router.rs
model_gateway/src/server.rs
model_gateway/tests/api/api_endpoints_test.rs
model_gateway/tests/common/tls_mock_worker.rs

coderabbitai · 2026-03-22T22:11:29Z

    async fn select_pd_pair(
        &self,
        request_text: Option<&str>,
-        model_id: Option<&str>,
+        model_id: &str,
        headers: Option<&HeaderMap>,
    ) -> Result<(Arc<dyn Worker>, Arc<dyn Worker>), String> {
-        let effective_model_id = if self.enable_igw { model_id } else { None };
+        debug!("Selecting PD pair: model_id={:?}", model_id);

-        debug!(
-            "Selecting PD pair: enable_igw={}, model_id={:?}, effective_model_id={:?}",
-            self.enable_igw, model_id, effective_model_id
-        );
+        let is_unknown_model = model_id == UNKNOWN_MODEL_ID;

-        let prefill_workers = if let Some(model) = effective_model_id {
-            self.worker_registry
-                .get_by_model(model)
+        let prefill_workers = {
+            let by_model: Vec<_> = self
+                .worker_registry
+                .get_by_model(model_id)
                .iter()
                .filter(|w| matches!(w.worker_type(), WorkerType::Prefill))
                .cloned()
-                .collect()
-        } else {
-            self.worker_registry.get_prefill_workers()
+                .collect();
+            if by_model.is_empty() && is_unknown_model {
+                // Only fall back to all workers when model is "unknown" (wildcard)
+                self.worker_registry.get_prefill_workers()
+            } else {
+                by_model
+            }
        };

-        let decode_workers = if let Some(model) = effective_model_id {
-            self.worker_registry
-                .get_by_model(model)
+        let decode_workers = {
+            let by_model: Vec<_> = self
+                .worker_registry
+                .get_by_model(model_id)
                .iter()
                .filter(|w| matches!(w.worker_type(), WorkerType::Decode))
                .cloned()
-                .collect()
-        } else {
-            self.worker_registry.get_decode_workers()
+                .collect();
+            if by_model.is_empty() && is_unknown_model {
+                // Only fall back to all workers when model is "unknown" (wildcard)
+                self.worker_registry.get_decode_workers()
+            } else {
+                by_model
+            }


⚠️ Potential issue | 🟠 Major

Surface explicit PD model misses as model_not_found.

The new explicit-model filtering is good, but select_pd_pair() still reports failures as Err(String). When get_by_model(model_id) is empty for an explicit model, that bubbles out through handle_server_selection_error() as 503 server_selection_failed, so unsupported models are retried like transient outages. Short-circuit explicit misses as error::model_not_found(model_id) (or another typed non-retriable error) before falling into pick_worker_by_policy_arc().

Based on learnings, 503 is reserved for healthy workers whose circuit breaker is open, while unsupported models should fall through to 404.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/http/pd_router.rs` around lines 726 - 765, In select_pd_pair: detect when self.worker_registry.get_by_model(model_id) returns empty for an explicit (non-wildcard) model_id (i.e. when is_unknown_model is false) and short-circuit by returning error::model_not_found(model_id) (or the appropriate typed non-retriable error) instead of continuing to pick_worker_by_policy_arc()/handle_server_selection_error; apply this check for both the prefill_workers and decode_workers branches so unsupported explicit models return a model_not_found error rather than bubbling as a 503.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 59b4dc8d03

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T00:13:16Z

    if !available_models.contains(&model.to_string()) {
-        return Some(error::service_unavailable(
-            "no_available_workers",
-            format!(
-                "No workers available for model '{}'. Available models: {}",
-                model,
-                available_models.join(", ")
-            ),
-        ));
+        return Some(error::model_not_found(model));


Return 503 when responses workers are temporarily absent

validate_worker_availability now maps any missing model entry in the live registry to model_not_found (404), but this check runs before dispatch and cannot distinguish a true bad model from transient capacity loss (e.g., rolling restart, delayed registration, or full drain). In those outage windows /v1/responses will return a client error instead of a retriable 5xx, causing callers and gateways to stop retrying despite the model being valid.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-23T00:13:16Z

+                return if total.is_empty() {
+                    error::model_not_found(model_id)
+                } else {


Avoid model-not-found for zero-worker HTTP outages

When worker selection fails, the new branch returns model_not_found if the filtered worker list is empty. This makes transient zero-worker conditions (startup race, discovery blip, or scale-to-zero window) look like a permanent client/model error, so clients that retry only on 5xx may fail fast instead of recovering once workers re-register.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

model_gateway/src/routers/http/router.rs (1)

281-301: ⚠️ Potential issue | 🟠 Major

Preserve the 404/503 split here, and don't surface the "unknown" sentinel.

This fallback still counts unhealthy/stale registrations, so a model that only exists on failed health checks can return 503 instead of the intended 404. Because UNKNOWN_MODEL_ID shares the same path, /generate without a model can also fall through to model_not_found("unknown"), which leaks an internal sentinel to clients.

Suggested fix

-                let total = self.worker_registry.get_workers_filtered(
+                let healthy = self.worker_registry.get_workers_filtered(
                     model_filter,
                     Some(WorkerType::Regular),
                     Some(ConnectionMode::Http),
                     None,
-                    false,
+                    true,
+                );
+                let total = self.worker_registry.get_workers_filtered(
+                    model_filter,
+                    Some(WorkerType::Regular),
+                    Some(ConnectionMode::Http),
+                    None,
+                    false,
                 );
-                return if total.is_empty() {
-                    error::model_not_found(model_id)
+                return if model_id == crate::core::UNKNOWN_MODEL_ID {
+                    if total.is_empty() {
+                        error::service_unavailable("no_workers", "No available workers")
+                    } else {
+                        error::service_unavailable(
+                            "no_available_workers",
+                            "All workers are unavailable",
+                        )
+                    }
+                } else if healthy.is_empty() {
+                    error::model_not_found(model_id)
                 } else {
                     error::service_unavailable(
                         "no_available_workers",
-                        "All workers are unavailable (circuit breaker open or unhealthy)",
+                        "All healthy workers are unavailable",
                     )
                 };

Based on learnings, any_external_worker_supports_model intentionally uses healthy_only = true so 503 is reserved for circuit-broken healthy workers while unhealthy-only matches should fall through to 404.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/http/router.rs` around lines 281 - 301, The code
currently treats UNKNOWN_MODEL_ID as a filter and counts unhealthy/stale workers
(healthy_only = false), which leaks the "unknown" sentinel and can return 503
for models that only have unhealthy registrations; fix by early-returning a 404
when model_id == crate::core::UNKNOWN_MODEL_ID (do not continue to the worker
lookup) and when querying workers use
self.worker_registry.get_workers_filtered(..., true) i.e. set the healthy_only
flag to true so the 404/503 split is preserved; refer to model_id,
UNKNOWN_MODEL_ID, get_workers_filtered, and any_external_worker_supports_model
to locate the logic to change.

model_gateway/src/routers/http/pd_router.rs (1)

736-766: ⚠️ Potential issue | 🟠 Major

Explicit model misses still return 503 instead of 404.

The fallback logic is now correctly restricted to UNKNOWN_MODEL_ID, which addresses the first past review concern. However, when get_by_model(model_id) returns empty for an explicit model, the error path flows through pick_worker_by_policy_arc → handle_server_selection_error → 503 service_unavailable.

Per retrieved learnings, 503 is reserved for healthy workers whose circuit breaker is open, while unsupported models should return 404. Consider short-circuiting explicit model misses before calling pick_worker_by_policy_arc:
Suggested approach
         let prefill_workers = {
             let by_model: Vec<_> = self
                 .worker_registry
                 .get_by_model(model_id)
                 .iter()
                 .filter(|w| matches!(w.worker_type(), WorkerType::Prefill))
                 .cloned()
                 .collect();
             if by_model.is_empty() && is_unknown_model {
                 // "auto" means pick any — fall back to all prefill workers
                 self.worker_registry.get_prefill_workers()
+            } else if by_model.is_empty() {
+                // Explicit model not found — fail fast with 404-style error
+                return Err(format!("model_not_found:{model_id}"));
             } else {
                 by_model
             }
         };
Then in execute_dual_dispatch, check for the model_not_found: prefix and return error::model_not_found(model_id) instead of calling handle_server_selection_error.
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/http/pd_router.rs` around lines 736 - 766, When an
explicit model lookup yields no workers the code currently falls through to
pick_worker_by_policy_arc and ultimately returns 503; instead short-circuit
explicit misses and return 404. After computing prefill_workers and
decode_workers (the blocks using worker_registry.get_by_model in pd_router.rs)
or at the start of execute_dual_dispatch, add a check: if model_id !=
UNKNOWN_MODEL_ID and (prefill_workers.is_empty() || decode_workers.is_empty())
then immediately return error::model_not_found(model_id) rather than calling
pick_worker_by_policy_arc / handle_server_selection_error; this ensures explicit
model-not-found returns a 404.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs`:
- Line 288: The assignment let model = model_id; in worker_selection.rs is a
redundant rebinding; remove that line and use model_id directly in subsequent
metric calls (the same pattern used in select_single_worker) to avoid the no-op
variable and keep consistency — update references in the surrounding function
(worker selection logic) to use model_id instead of model.

In `@model_gateway/src/server.rs`:
- Around line 227-239: The /v1/rerank compatibility path converts the incoming
body into RerankRequest and only checks model.is_empty(), which skips the full
validation applied elsewhere (e.g., ValidatedJson<RerankRequest>) and emits a
non-standard error shape; instead, take the converted rerank_body and run it
through the same validation/error path used for ValidatedJson (validate fields
like query, documents, top_k, etc.), returning the shared bad-request JSON
structure on failure; locate the conversion of body.into() into rerank_body and
replace the lone model check by invoking the existing RerankRequest validation
routine or the same validator used earlier so all validation errors and shapes
remain consistent across /v1/rerank.

---

Duplicate comments:
In `@model_gateway/src/routers/http/pd_router.rs`:
- Around line 736-766: When an explicit model lookup yields no workers the code
currently falls through to pick_worker_by_policy_arc and ultimately returns 503;
instead short-circuit explicit misses and return 404. After computing
prefill_workers and decode_workers (the blocks using
worker_registry.get_by_model in pd_router.rs) or at the start of
execute_dual_dispatch, add a check: if model_id != UNKNOWN_MODEL_ID and
(prefill_workers.is_empty() || decode_workers.is_empty()) then immediately
return error::model_not_found(model_id) rather than calling
pick_worker_by_policy_arc / handle_server_selection_error; this ensures explicit
model-not-found returns a 404.

In `@model_gateway/src/routers/http/router.rs`:
- Around line 281-301: The code currently treats UNKNOWN_MODEL_ID as a filter
and counts unhealthy/stale workers (healthy_only = false), which leaks the
"unknown" sentinel and can return 503 for models that only have unhealthy
registrations; fix by early-returning a 404 when model_id ==
crate::core::UNKNOWN_MODEL_ID (do not continue to the worker lookup) and when
querying workers use self.worker_registry.get_workers_filtered(..., true) i.e.
set the healthy_only flag to true so the 404/503 split is preserved; refer to
model_id, UNKNOWN_MODEL_ID, get_workers_filtered, and
any_external_worker_supports_model to locate the logic to change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9847e147-6bb2-4a3c-909a-25291993e68a

📥 Commits

Reviewing files that changed from the base of the PR and between 4d13e64 and 59b4dc8.

📒 Files selected for processing (9)

crates/protocols/src/common.rs
crates/protocols/src/tokenize.rs
model_gateway/src/routers/grpc/common/responses/utils.rs
model_gateway/src/routers/grpc/common/stages/worker_selection.rs
model_gateway/src/routers/http/pd_router.rs
model_gateway/src/routers/http/router.rs
model_gateway/src/server.rs
model_gateway/tests/api/api_endpoints_test.rs
model_gateway/tests/common/tls_mock_worker.rs

coderabbitai · 2026-03-23T00:14:46Z

        let decode_idx = policy.select_worker(&available_decode, &info)?;

-        let model = model_id.unwrap_or(UNKNOWN_MODEL_ID);
+        let model = model_id;


🧹 Nitpick | 🔵 Trivial

Minor: Redundant variable rebinding.

let model = model_id; is a no-op rebinding. Consider using model_id directly in the metrics calls below for consistency with select_single_worker (line 181).

♻️ Suggested simplification

- let model = model_id; let policy_name = policy.name(); // Record worker selection metrics for both prefill and decode Metrics::record_worker_selection( metrics_labels::WORKER_PREFILL, metrics_labels::CONNECTION_GRPC, - model, + model_id, policy_name, ); Metrics::record_worker_selection( metrics_labels::WORKER_DECODE, metrics_labels::CONNECTION_GRPC, - model, + model_id, policy_name, );

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let model = model_id;

let policy_name = policy.name();

// Record worker selection metrics for both prefill and decode

Metrics::record_worker_selection(

metrics_labels::WORKER_PREFILL,

metrics_labels::CONNECTION_GRPC,

model_id,

policy_name,

);

Metrics::record_worker_selection(

metrics_labels::WORKER_DECODE,

metrics_labels::CONNECTION_GRPC,

model_id,

policy_name,

);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs` at line 288, The assignment let model = model_id; in worker_selection.rs is a redundant rebinding; remove that line and use model_id directly in subsequent metric calls (the same pattern used in select_single_worker) to avoid the no-op variable and keep consistency — update references in the surrounding function (worker selection logic) to use model_id instead of model.

- gRPC worker_selection: treat UNKNOWN_MODEL_ID as wildcard (any worker), return 404 model_not_found instead of 503 when no workers serve model - gRPC router: validate_worker_availability returns 404 for unsupported model - PD router: don't fall back to all workers for explicit model — only UNKNOWN_MODEL_ID triggers cross-model fallback - V1 rerank: return 400 "model_required" instead of silently picking first available model - HTTP router: distinguish 404 (model not found) from 503 (workers exist but all unavailable) using same UNKNOWN_MODEL_ID wildcard pattern - tokenize: add test_detokenize_request_requires_model - TLS mock worker: add served_model_name and /server_info route alias Signed-off-by: Simon Lin <simonslin@gmail.com> Signed-off-by: Simo Lin <linsimo.mark@gmail.com>

CatherineSue requested review from key4ng and slin1237 as code owners March 10, 2026 21:38

github-actions Bot added the model-gateway Model gateway crate changes label Mar 10, 2026

CatherineSue force-pushed the refactor/require-model-field branch from 0e1f6d9 to 2c6a5b2 Compare March 10, 2026 21:40

chatgpt-codex-connector Bot reviewed Mar 10, 2026

View reviewed changes

Comment thread model_gateway/src/routers/router_manager.rs Outdated

Comment thread crates/protocols/src/rerank.rs Outdated

coderabbitai Bot requested changes Mar 10, 2026

View reviewed changes

Comment thread crates/protocols/src/tokenize.rs

Comment thread model_gateway/src/routers/router_manager.rs Outdated

CatherineSue added 2 commits March 21, 2026 11:18

feat(protocols): make model field required, remove "unknown" default

3f19a66

Signed-off-by: Chang Su <chang.s.su@oracle.com>

slin1237 force-pushed the refactor/require-model-field branch from 2c6a5b2 to fd577d8 Compare March 21, 2026 18:18

coderabbitai Bot reviewed Mar 21, 2026

View reviewed changes

slin1237 added 2 commits March 22, 2026 12:06

github-actions Bot added grpc gRPC client and router changes benchmarks Benchmark changes tests Test changes openai OpenAI router changes labels Mar 22, 2026

chatgpt-codex-connector Bot reviewed Mar 22, 2026

View reviewed changes

Comment thread model_gateway/src/routers/grpc/common/stages/worker_selection.rs Outdated

Comment thread model_gateway/src/routers/http/pd_router.rs Outdated

coderabbitai Bot requested changes Mar 22, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Mar 22, 2026

View reviewed changes

coderabbitai Bot requested changes Mar 22, 2026

View reviewed changes

slin1237 force-pushed the refactor/require-model-field branch from 4d13e64 to 59b4dc8 Compare March 23, 2026 00:04

chatgpt-codex-connector Bot reviewed Mar 23, 2026

View reviewed changes

coderabbitai Bot requested changes Mar 23, 2026

View reviewed changes

slin1237 force-pushed the refactor/require-model-field branch from 59b4dc8 to 8de21dc Compare March 23, 2026 00:17

slin1237 merged commit 7c518b2 into main Mar 23, 2026
37 checks passed

slin1237 deleted the refactor/require-model-field branch March 23, 2026 01:26

slin1237 mentioned this pull request Mar 23, 2026

[Bug]: It is impossible to enable the multimodal gateway and specify worker URLs in a Compose file. #752

Closed

6 tasks

coderabbitai Bot mentioned this pull request Mar 23, 2026

feat(completions): add native gRPC pipeline typing for /v1/completions #840

Merged

-        let model = model_id;
+        let policy_name = policy.name();
+         // Record worker selection metrics for both prefill and decode
+         Metrics::record_worker_selection(
+             metrics_labels::WORKER_PREFILL,
+             metrics_labels::CONNECTION_GRPC,
+             model_id,
+             policy_name,
+         );
+         Metrics::record_worker_selection(
+             metrics_labels::WORKER_DECODE,
+             metrics_labels::CONNECTION_GRPC,
+             model_id,
+             policy_name,
+         );

Conversation

CatherineSue commented Mar 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Breaking Changes

Test Plan

Summary by CodeRabbit

Uh oh!

gemini-code-assist Bot commented Mar 10, 2026

Uh oh!

coderabbitai Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

CatherineSue commented Mar 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 10, 2026 •

edited

Loading