refactor: make model field required, remove resolve_model_id#713
refactor: make model field required, remove resolve_model_id#713
Conversation
|
Warning Gemini encountered an error creating the summary. You can try again by commenting |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRemoved implicit/default model provisioning from protocols and made model identifiers required through the routing stack: protocol request serde defaults removed/adjusted, RequestContext/RequestInput model_id made concrete, many RouterTrait signatures changed from Option<&str> to &str, and worker selection/metrics/tokenizer lookups updated accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant API as API Server
participant RouterMgr as RouterManager
participant Router as Router (gRPC/HTTP/OpenAI)
participant Worker as Worker/Model
Client->>API: POST /v1/... (body with "model")
API->>RouterMgr: route_*(headers, body, model_id = &body.model)
RouterMgr->>Router: select_router(model_id)
Router->>Worker: forward request(model_id, payload)
Worker-->>Router: response
Router-->>RouterMgr: routed response
RouterMgr-->>API: final response
API-->>Client: HTTP response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
0e1f6d9 to
2c6a5b2
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0e1f6d9d98
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/protocols/src/responses.rs (1)
787-798:⚠️ Potential issue | 🟡 Minor
ResponsesRequest::default()creates an empty model that bypasses routing safeguards.The
Defaultimplementation setsmodel: String::new(), andGenerationRequest::get_model()returnsSome("")for it. Validation does not reject empty models. While current production code explicitly constructs or carriesmodelfrom upstream requests, theDefaultimpl creates a latent path that could silently route with a blank model if future code uses it. Consider removingDefaultor adding model validation to prevent accidental misuse.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/protocols/src/responses.rs` around lines 787 - 798, The ResponsesRequest::default() creates an empty model string which can bypass routing; remove the Default impl for ResponsesRequest so callers must explicitly set model, or if Default must exist change behavior so empty model is treated as unset: update GenerationRequest::get_model() to return None when model is an empty string (i.e., treat "" as no model) and/or change the default to a clearly invalid sentinel (or make model an Option<String>) so validation rejects an absent/empty model rather than silently routing with "".crates/protocols/src/rerank.rs (1)
202-208:⚠️ Potential issue | 🟠 MajorEmpty
modelfield fromV1RerankReqInputconversion causes 404 failures in multi-router mode instead of failing fast.When
v1_rerankhandler convertsV1RerankReqInputtoRerankRequest, it setsmodel: String::new(). In multi-router setups,select_router_for_requestpasses this empty string toget_router_for_model(""), which returns no workers. The handler then fails with 404 instead of rejecting the request at validation time.Fix: Add
modelfield toV1RerankReqInput(with appropriate default or validation) or validateRerankRequest.modelfor non-empty content before routing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/protocols/src/rerank.rs` around lines 202 - 208, The conversion impl From<V1RerankReqInput> for RerankRequest is setting model to an empty string (model: String::new()), which leads select_router_for_request -> get_router_for_model("") to return no workers and a 404; update the conversion to propagate a model value from V1RerankReqInput (add a model field to V1RerankReqInput if missing) or, alternatively, add validation in v1_rerank before routing to ensure RerankRequest.model is non-empty and return a validation error; touch the From<V1RerankReqInput> for RerankRequest, V1RerankReqInput struct, and the v1_rerank handler/select_router_for_request validation paths when applying the fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/protocols/src/tokenize.rs`:
- Around line 232-241: Add a negative deserialization test for DetokenizeRequest
similar to test_tokenize_request_requires_model: create a JSON string that omits
the "model" field (e.g. just "tokens": ...) and attempt
serde_json::from_str::<DetokenizeRequest>(); assert that it returns an Err (i.e.
deserialization fails). Place it as a new #[test] fn (e.g.
test_detokenize_request_requires_model) next to
test_detokenize_request_single/batch and reference DetokenizeRequest and
TokensInput to locate the struct to verify the behavior.
In `@model_gateway/src/routers/router_manager.rs`:
- Around line 406-409: The current call to select_router_for_request(headers,
model_id) before invoking router.route_generate(...) allows model-less /generate
requests to be routed arbitrarily; change the logic so /generate preserves
explicit model resolution: if model_id is Some, keep current behavior (call
select_router_for_request and route_generate); if model_id is None then run the
previous “single-model default or ambiguity failure” resolution (i.e., determine
the unique routable model or return an error if multiple models are routable)
and only then call route_generate with that resolved model_id; ensure you update
the branching around select_router_for_request and route_generate to reject
missing model_id when multiple models are routable.
---
Outside diff comments:
In `@crates/protocols/src/rerank.rs`:
- Around line 202-208: The conversion impl From<V1RerankReqInput> for
RerankRequest is setting model to an empty string (model: String::new()), which
leads select_router_for_request -> get_router_for_model("") to return no workers
and a 404; update the conversion to propagate a model value from
V1RerankReqInput (add a model field to V1RerankReqInput if missing) or,
alternatively, add validation in v1_rerank before routing to ensure
RerankRequest.model is non-empty and return a validation error; touch the
From<V1RerankReqInput> for RerankRequest, V1RerankReqInput struct, and the
v1_rerank handler/select_router_for_request validation paths when applying the
fix.
In `@crates/protocols/src/responses.rs`:
- Around line 787-798: The ResponsesRequest::default() creates an empty model
string which can bypass routing; remove the Default impl for ResponsesRequest so
callers must explicitly set model, or if Default must exist change behavior so
empty model is treated as unset: update GenerationRequest::get_model() to return
None when model is an empty string (i.e., treat "" as no model) and/or change
the default to a clearly invalid sentinel (or make model an Option<String>) so
validation rejects an absent/empty model rather than silently routing with "".
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8f2a2cd4-55a2-4b7a-9835-a06421dc8980
📒 Files selected for processing (7)
crates/protocols/src/chat.rscrates/protocols/src/common.rscrates/protocols/src/interactions.rscrates/protocols/src/rerank.rscrates/protocols/src/responses.rscrates/protocols/src/tokenize.rsmodel_gateway/src/routers/router_manager.rs
💤 Files with no reviewable changes (1)
- crates/protocols/src/common.rs
Signed-off-by: Chang Su <chang.s.su@oracle.com>
…hods Now that model is a required field in protocol types, the resolve_model_id method is no longer needed. Simplify route_generate, route_chat, route_completion, and route_messages by removing the enable_igw branching and passing model directly from request bodies. Signed-off-by: Chang Su <chang.s.su@oracle.com>
2c6a5b2 to
fd577d8
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
crates/protocols/src/tokenize.rs (1)
231-245: 🧹 Nitpick | 🔵 TrivialAdd matching negative test for DetokenizeRequest.
The PR updated the detokenize tests to include the model field, but there's no corresponding negative test verifying that
DetokenizeRequestalso fails deserialization whenmodelis omitted. This would ensure parity with thetest_tokenize_request_requires_modeltest.Proposed test
+ #[test] + fn test_detokenize_request_requires_model() { + let json = r#"{"tokens": [1, 2, 3]}"#; + let result = serde_json::from_str::<DetokenizeRequest>(json); + assert!(result.is_err(), "Should fail without model field"); + } + #[test] fn test_detokenize_request_single() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/protocols/src/tokenize.rs` around lines 231 - 245, Add a negative test that verifies DetokenizeRequest deserialization fails when the required model field is omitted: create a new test (e.g., test_detokenize_request_requires_model) that tries to deserialize a JSON string like r#"{"tokens": [1,2,3]}"# (and another case for batch tokens if desired) into DetokenizeRequest using serde_json::from_str and assert that it returns an Err; reference the DetokenizeRequest type and TokensInput enum to mirror the existing positive tests and ensure parity with test_tokenize_request_requires_model.model_gateway/src/routers/router_manager.rs (1)
509-526:⚠️ Potential issue | 🟠 MajorPreserve explicit model resolution for
/generate.
/generateis the one model-optional path per PR objectives, but withresolve_model_idremoved,select_router_for_request(headers, model_id)withmodel_id=Nonewill select a router based solely on headers and worker distribution scoring. In IGW mode, this means a request withoutmodel_idcan now be sent to an arbitrary router instead of following the previous "single model default or ambiguity failure" behavior.Consider either:
- Preserving the old inference logic for this endpoint (if only one model is routable, use it; otherwise fail)
- Rejecting missing
model_idwhen multiple models are routable in IGW mode🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/router_manager.rs` around lines 509 - 526, route_generate currently forwards requests with model_id=None to select_router_for_request, which allows header-only selection and breaks the previous explicit-model resolution behavior; change it so that when model_id.is_none() we first attempt the old resolution logic (call self.resolve_model_id(headers) or equivalent) and only then call select_router_for_request with the resolved model id; if resolution yields multiple routable models in IGW mode return a NOT_FOUND or explicit error (reject missing model_id), and if resolution yields exactly one use that model id to pick the router and proceed with router.route_generate(headers, body, Some(resolved_model_id)).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@crates/protocols/src/tokenize.rs`:
- Around line 231-245: Add a negative test that verifies DetokenizeRequest
deserialization fails when the required model field is omitted: create a new
test (e.g., test_detokenize_request_requires_model) that tries to deserialize a
JSON string like r#"{"tokens": [1,2,3]}"# (and another case for batch tokens if
desired) into DetokenizeRequest using serde_json::from_str and assert that it
returns an Err; reference the DetokenizeRequest type and TokensInput enum to
mirror the existing positive tests and ensure parity with
test_tokenize_request_requires_model.
In `@model_gateway/src/routers/router_manager.rs`:
- Around line 509-526: route_generate currently forwards requests with
model_id=None to select_router_for_request, which allows header-only selection
and breaks the previous explicit-model resolution behavior; change it so that
when model_id.is_none() we first attempt the old resolution logic (call
self.resolve_model_id(headers) or equivalent) and only then call
select_router_for_request with the resolved model id; if resolution yields
multiple routable models in IGW mode return a NOT_FOUND or explicit error
(reject missing model_id), and if resolution yields exactly one use that model
id to pick the router and proceed with router.route_generate(headers, body,
Some(resolved_model_id)).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: dd0c9ef9-ff84-4eab-8b40-ac3a4663ca3e
📒 Files selected for processing (7)
crates/protocols/src/chat.rscrates/protocols/src/common.rscrates/protocols/src/interactions.rscrates/protocols/src/rerank.rscrates/protocols/src/responses.rscrates/protocols/src/tokenize.rsmodel_gateway/src/routers/router_manager.rs
💤 Files with no reviewable changes (1)
- crates/protocols/src/common.rs
… wrappers Cascade the required model field through the entire routing layer: - RouterTrait: all methods now take model_id: &str (no more Option) - GenerateRequest.model: Option<String> → String with serde default "unknown" - HTTP/gRPC routers: remove enable_igw model-filtering guard, always route by model - select_worker_for_model: treats "unknown" as wildcard (any worker) - Pipeline contexts: model_id: Option<String> → String - Worker selection: always filters by model, no UNKNOWN_MODEL_ID fallbacks - V1 rerank: resolves model from available workers when absent Signed-off-by: Simon Lin <simonslin@gmail.com> Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
Update mock worker and test assertions for the required model field: - Mock worker: add served_model_name to server_info, add /server_info and /model_info route aliases, remove should_fail from server_info (metadata endpoints must always succeed for worker registration) - Test payloads: use "mock-model" consistently (matching the mock worker) - Router: distinguish 404 (model not found) from 503 (workers exist but unavailable) when select_worker returns None - GenerateRequest.model defaults to "unknown" via serde; router treats "unknown" as wildcard (any worker) Signed-off-by: Simon Lin <simonslin@gmail.com> Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 96965b4756
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
model_gateway/tests/api/request_formats_test.rs (1)
70-99: 🧹 Nitpick | 🔵 TrivialAdd a missing-model assertion for the new 400 contract.
Lines 70-99 and Lines 118-151 only verify the happy path with
modelpresent. Since this PR changes/v1/chat/completionsand/v1/completionsfrom implicit fallback to model-required requests, this file should also pin the failure path; otherwise the old fallback can regress without tripping these smoke tests.Also applies to: 118-151
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/tests/api/request_formats_test.rs` around lines 70 - 99, Add a failing-path assertion that ensures requests without "model" return the new 400 contract: create a payload that omits "model" (e.g., same messages/params but no model field), call ctx.make_request("/v1/chat/completions", payload).await, assert the result indicates a failure (or response.status == 400) and that the error body matches the expected shape (e.g., contains an "error" field with message/type). Repeat the same missing-model assertion for the "/v1/completions" test block so both endpoints validate the new model-required behavior.model_gateway/tests/spec/rerank.rs (1)
372-383:⚠️ Potential issue | 🟠 MajorDon't normalize a model-less V1 rerank request into
model == "".Lines 35-55 now establish that
RerankRequestis invalid without a model, but this test still blesses aV1RerankReqInput -> RerankRequestconversion that manufactures an emptymodel. That recreates the omitted-model fallback this PR is removing and lets the V1 path dodge the new invariant until much later in the flow.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/tests/spec/rerank.rs` around lines 372 - 383, The test test_v1_to_rerank_request_conversion is incorrectly asserting that converting a V1RerankReqInput yields a RerankRequest with model == ""; instead update the test to assert the conversion is rejected because RerankRequest now requires a model. Replace the direct Into conversion with a fallible conversion (use TryFrom/TryInto or RerankRequest::try_from(V1RerankReqInput)) and assert that the result is an Err (or that conversion panics if the codebase uses panics for invalid input), and remove any assertion expecting an empty model; keep checks for query/documents if relevant to the Err case.model_gateway/src/server.rs (1)
200-208:⚠️ Potential issue | 🟠 Major
Json<CompletionRequest>will return 422 (not 400) for a missingmodelfield.Axum's
Json<T>extractor returns422 Unprocessable Entitywhen required fields fail deserialization. If/v1/completionsis meant to enforce the 400 status for missing models (as in other handlers), this needs a custom rejection mapping or a validated extractor that explicitly handles the missingmodelcase. The same issue applies to theJson<T>tokenize/detokenize handlers elsewhere in this file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/server.rs` around lines 200 - 208, The v1_completions handler (async fn v1_completions) currently uses axum's Json<CompletionRequest> extractor which yields 422 on deserialization failures (e.g., missing model) but you want a 400 for missing model; change the extractor to a validated/custom extractor or map Json rejections so that when CompletionRequest deserializes but the model field is absent you return a 400 before calling state.router.route_completion(Some(&headers), &body, &body.model), and apply the same change to the tokenize/detokenize handlers that also use Json<T> so they likewise return 400 for missing model instead of 422.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs`:
- Around line 82-118: When select_single_worker() or select_pd_pair() returns
None in WorkerSelectionStage::execute, distinguish "model not advertised by any
worker" from "workers advertise model but none usable": call
any_external_worker_supports_model(model_id, headers, healthy_only = false) to
detect if any worker advertises the model at all; if that returns false return a
404 ("unsupported_model") otherwise return the existing 503
("no_available_workers"/"no_available_pd_worker_pairs"). Update the None
branches in the Regular and PrefillDecode arms to perform this check and map to
404 vs 503 accordingly, referencing select_single_worker, select_pd_pair,
any_external_worker_supports_model and WorkerSelectionStage::execute.
In `@model_gateway/src/routers/grpc/router.rs`:
- Around line 288-292: The early return in the gRPC router uses
validate_worker_availability(&self.worker_registry, model_id) which currently
emits service_unavailable() for missing/unsupported models; change the behavior
so a missing model returns model_not_found() (404) instead of 503. Either update
validate_worker_availability to detect the "model absent" case and return
model_not_found() there, or remap the result at the call site in router.rs
(where validate_worker_availability is invoked) to translate the specific "model
absent" error into model_not_found() while leaving other transient errors as
service_unavailable(); use the existing helper names
validate_worker_availability, service_unavailable, and model_not_found to locate
and implement the change.
In `@model_gateway/src/routers/http/pd_router.rs`:
- Around line 734-761: The code currently falls back to global pools when
get_by_model(model_id) is empty, which can route an explicit unsupported model
to unrelated prefill/decode workers; change the logic in the prefill_workers and
decode_workers blocks so that if self.worker_registry.get_by_model(model_id)
returns empty and model_id != UNKNOWN_MODEL_ID you do not call
get_prefill_workers()/get_decode_workers() but instead return an error (or fail
the request) indicating unsupported model; only allow the global fallback when
model_id == UNKNOWN_MODEL_ID. Update the branches around prefill_workers and
decode_workers (and any surrounding control flow that consumes them) to enforce
this check and avoid mixing pools for partially matched models.
In `@model_gateway/src/routers/http/router.rs`:
- Around line 281-300: The current check calls
self.worker_registry.get_workers_filtered(model_filter,
Some(WorkerType::Regular), Some(ConnectionMode::Http), None, false) which counts
unhealthy/stale workers and causes unhealthy-only matches to trigger a 503;
change the call to use healthy_only = true (match how
any_external_worker_supports_model does) so you only treat the absence of
healthy workers as service_unavailable while leaving unhealthy-only matches to
fall through to model_not_found/404.
In `@model_gateway/src/server.rs`:
- Around line 227-234: The code currently fills an empty RerankRequest.model by
taking the first entry from state.context.worker_registry.get_models(); instead,
detect when rerank_body.model.is_empty() and return a 400 Bad Request with a
clear error message (e.g. "model is required for /v1/rerank") rather than
choosing an arbitrary model; update the handler that processes RerankRequest
(the branch that references rerank_body and RerankRequest and calls
state.context.worker_registry.get_models()) to produce the HTTP 400 response
using the same error response mechanism the server uses elsewhere.
In `@model_gateway/tests/common/mock_worker.rs`:
- Around line 90-92: The TLS mock's route set and server_info payload diverge
from the plain mock; update model_gateway/tests/common/tls_mock_worker.rs so its
router registers the same endpoints as the plain mock (add routes for
"/server_info", "/model_info", and "/get_model_info" alongside
"/get_server_info") and make its server_info_handler return the same JSON
structure including the served_model_name field (match the payload shape
produced by the plain mock's server_info_handler) so TLS-path tests use the
identical discovery contract.
In `@model_gateway/tests/routing/load_balancing_test.rs`:
- Around line 319-323: The test is too loose using !resp.status().is_success();
tighten it to assert the exact no-workers status—change the assertion in
load_balancing_test.rs to expect HTTP 404 (StatusCode::NOT_FOUND) for the no
healthy-matching-worker path so it aligns with select_worker_for_model's
contract (503 reserved for circuit-breaker); update the failure message to
reflect the expected NOT_FOUND status and keep the check using resp.status() for
clarity.
---
Outside diff comments:
In `@model_gateway/src/server.rs`:
- Around line 200-208: The v1_completions handler (async fn v1_completions)
currently uses axum's Json<CompletionRequest> extractor which yields 422 on
deserialization failures (e.g., missing model) but you want a 400 for missing
model; change the extractor to a validated/custom extractor or map Json
rejections so that when CompletionRequest deserializes but the model field is
absent you return a 400 before calling
state.router.route_completion(Some(&headers), &body, &body.model), and apply the
same change to the tokenize/detokenize handlers that also use Json<T> so they
likewise return 400 for missing model instead of 422.
In `@model_gateway/tests/api/request_formats_test.rs`:
- Around line 70-99: Add a failing-path assertion that ensures requests without
"model" return the new 400 contract: create a payload that omits "model" (e.g.,
same messages/params but no model field), call
ctx.make_request("/v1/chat/completions", payload).await, assert the result
indicates a failure (or response.status == 400) and that the error body matches
the expected shape (e.g., contains an "error" field with message/type). Repeat
the same missing-model assertion for the "/v1/completions" test block so both
endpoints validate the new model-required behavior.
In `@model_gateway/tests/spec/rerank.rs`:
- Around line 372-383: The test test_v1_to_rerank_request_conversion is
incorrectly asserting that converting a V1RerankReqInput yields a RerankRequest
with model == ""; instead update the test to assert the conversion is rejected
because RerankRequest now requires a model. Replace the direct Into conversion
with a fallible conversion (use TryFrom/TryInto or
RerankRequest::try_from(V1RerankReqInput)) and assert that the result is an Err
(or that conversion panics if the codebase uses panics for invalid input), and
remove any assertion expecting an empty model; keep checks for query/documents
if relevant to the Err case.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: aefc98e3-9162-43d8-a13e-ad2bd876d051
📒 Files selected for processing (33)
crates/protocols/src/common.rscrates/protocols/src/generate.rsmodel_gateway/benches/request_processing.rsmodel_gateway/benches/wasm_middleware_latency.rsmodel_gateway/src/routers/grpc/common/stages/dispatch_metadata.rsmodel_gateway/src/routers/grpc/common/stages/worker_selection.rsmodel_gateway/src/routers/grpc/context.rsmodel_gateway/src/routers/grpc/pd_router.rsmodel_gateway/src/routers/grpc/pipeline.rsmodel_gateway/src/routers/grpc/regular/responses/common.rsmodel_gateway/src/routers/grpc/regular/responses/handlers.rsmodel_gateway/src/routers/grpc/regular/stages/chat/preparation.rsmodel_gateway/src/routers/grpc/regular/stages/messages/preparation.rsmodel_gateway/src/routers/grpc/router.rsmodel_gateway/src/routers/grpc/utils/chat_utils.rsmodel_gateway/src/routers/http/pd_router.rsmodel_gateway/src/routers/http/router.rsmodel_gateway/src/routers/mod.rsmodel_gateway/src/routers/openai/chat.rsmodel_gateway/src/routers/openai/responses/route.rsmodel_gateway/src/routers/openai/router.rsmodel_gateway/src/routers/router_manager.rsmodel_gateway/src/server.rsmodel_gateway/tests/api/api_endpoints_test.rsmodel_gateway/tests/api/request_formats_test.rsmodel_gateway/tests/api/responses_api_test.rsmodel_gateway/tests/api/streaming_tests.rsmodel_gateway/tests/common/mock_worker.rsmodel_gateway/tests/otel_tracing_test.rsmodel_gateway/tests/routing/header_forwarding_test.rsmodel_gateway/tests/routing/load_balancing_test.rsmodel_gateway/tests/routing/test_openai_routing.rsmodel_gateway/tests/spec/rerank.rs
| let model_id = ctx.input.model_id.as_str(); | ||
| let workers = match self.mode { | ||
| WorkerSelectionMode::Regular => { | ||
| match self.select_single_worker( | ||
| ctx.input.model_id.as_deref(), | ||
| text, | ||
| tokens, | ||
| headers, | ||
| ) { | ||
| match self.select_single_worker(model_id, text, tokens, headers) { | ||
| Some(w) => WorkerSelection::Single { worker: w }, | ||
| None => { | ||
| let model = ctx.input.model_id.as_deref().unwrap_or(UNKNOWN_MODEL_ID); | ||
| error!( | ||
| function = "WorkerSelectionStage::execute", | ||
| mode = "Regular", | ||
| model_id = %model, | ||
| model_id = %model_id, | ||
| "No available workers for model" | ||
| ); | ||
| return Err(error::service_unavailable( | ||
| "no_available_workers", | ||
| format!("No available workers for model: {model}"), | ||
| format!("No available workers for model: {model_id}"), | ||
| )); | ||
| } | ||
| } | ||
| } | ||
| WorkerSelectionMode::PrefillDecode => { | ||
| match self.select_pd_pair(ctx.input.model_id.as_deref(), text, tokens, headers) { | ||
| match self.select_pd_pair(model_id, text, tokens, headers) { | ||
| Some((prefill, decode, runtime_type)) => WorkerSelection::Dual { | ||
| prefill, | ||
| decode, | ||
| runtime_type, | ||
| }, | ||
| None => { | ||
| let model = ctx.input.model_id.as_deref().unwrap_or(UNKNOWN_MODEL_ID); | ||
| error!( | ||
| function = "WorkerSelectionStage::execute", | ||
| mode = "PrefillDecode", | ||
| model_id = %model, | ||
| model_id = %model_id, | ||
| "No available PD worker pairs for model" | ||
| ); | ||
| return Err(error::service_unavailable( | ||
| "no_available_pd_worker_pairs", | ||
| format!("No available PD worker pairs for model: {model}"), | ||
| format!("No available PD worker pairs for model: {model_id}"), | ||
| )); |
There was a problem hiding this comment.
Split unsupported-model and unavailable-worker failures.
select_single_worker() / select_pd_pair() both collapse "no worker advertises this model" and "matching workers exist but none are currently usable" into None, and execute() always maps that to 503. Now that model_id is required, the first case should surface as 404, otherwise unsupported or typoed models look like transient outages. Based on learnings, any_external_worker_supports_model intentionally uses healthy_only = true so 503 is reserved for circuit-broken healthy workers, while absent models should fall through to 404.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs` around
lines 82 - 118, When select_single_worker() or select_pd_pair() returns None in
WorkerSelectionStage::execute, distinguish "model not advertised by any worker"
from "workers advertise model but none usable": call
any_external_worker_supports_model(model_id, headers, healthy_only = false) to
detect if any worker advertises the model at all; if that returns false return a
404 ("unsupported_model") otherwise return the existing 503
("no_available_workers"/"no_available_pd_worker_pairs"). Update the None
branches in the Regular and PrefillDecode arms to perform this check and map to
404 vs 503 accordingly, referencing select_single_worker, select_pd_pair,
any_external_worker_supports_model and WorkerSelectionStage::execute.
| // 0. Fast worker validation (fail-fast before expensive operations) | ||
| let requested_model: Option<&str> = model_id.or(Some(body.model.as_str())); | ||
|
|
||
| if let Some(error_response) = requested_model | ||
| .and_then(|model| validate_worker_availability(&self.worker_registry, model)) | ||
| if let Some(error_response) = validate_worker_availability(&self.worker_registry, model_id) | ||
| { | ||
| return error_response; | ||
| } |
There was a problem hiding this comment.
This early miss path still reports an unsupported model as 503.
validate_worker_availability() returns service_unavailable() when the model is absent, so /v1/responses on the gRPC router still treats a typoed or unsupported model as a transient outage. After making model mandatory, this short-circuit needs to return model_not_found/404 instead, either by remapping the helper result here or by teaching the helper to emit the right status.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@model_gateway/src/routers/grpc/router.rs` around lines 288 - 292, The early
return in the gRPC router uses
validate_worker_availability(&self.worker_registry, model_id) which currently
emits service_unavailable() for missing/unsupported models; change the behavior
so a missing model returns model_not_found() (404) instead of 503. Either update
validate_worker_availability to detect the "model absent" case and return
model_not_found() there, or remap the result at the call site in router.rs
(where validate_worker_availability is invoked) to translate the specific "model
absent" error into model_not_found() while leaving other transient errors as
service_unavailable(); use the existing helper names
validate_worker_availability, service_unavailable, and model_not_found to locate
and implement the change.
| // Distinguish "no workers for this model" from "workers exist but unavailable" | ||
| let model_filter = if model_id == crate::core::UNKNOWN_MODEL_ID { | ||
| None | ||
| } else { | ||
| Some(model_id) | ||
| }; | ||
| let total = self.worker_registry.get_workers_filtered( | ||
| model_filter, | ||
| Some(WorkerType::Regular), | ||
| Some(ConnectionMode::Http), | ||
| None, | ||
| false, | ||
| ); | ||
| return if total.is_empty() { | ||
| error::model_not_found(model_id) | ||
| } else { | ||
| error::service_unavailable( | ||
| "no_available_workers", | ||
| "All workers are unavailable (circuit breaker open or unhealthy)", | ||
| ) |
There was a problem hiding this comment.
Use a healthy-only existence check before returning 503.
get_workers_filtered(..., false) counts workers that are failing health checks, so a model with only stale/unhealthy registrations now goes down the 503 branch. That collapses “all healthy workers are circuit-broken” and “only unhealthy matches exist” into the same outcome and changes the 404/503 split.
Suggested change
- let total = self.worker_registry.get_workers_filtered(
+ let healthy = self.worker_registry.get_workers_filtered(
model_filter,
Some(WorkerType::Regular),
Some(ConnectionMode::Http),
None,
- false,
+ true,
);
- return if total.is_empty() {
+ return if healthy.is_empty() {
error::model_not_found(model_id)
} else {Based on learnings, any_external_worker_supports_model intentionally uses healthy_only = true so 503 is reserved for circuit-broken healthy workers while unhealthy-only matches should fall through to 404.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@model_gateway/src/routers/http/router.rs` around lines 281 - 300, The current
check calls self.worker_registry.get_workers_filtered(model_filter,
Some(WorkerType::Regular), Some(ConnectionMode::Http), None, false) which counts
unhealthy/stale workers and causes unhealthy-only matches to trigger a 503;
change the call to use healthy_only = true (match how
any_external_worker_supports_model does) so you only treat the absence of
healthy workers as service_unavailable while leaving unhealthy-only matches to
fall through to model_not_found/404.
| .route("/server_info", get(server_info_handler)) | ||
| .route("/get_server_info", get(server_info_handler)) | ||
| .route("/model_info", get(model_info_handler)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
for f in model_gateway/tests/common/mock_worker.rs model_gateway/tests/common/tls_mock_worker.rs; do
echo "=== $f ==="
rg -n -C2 'route\("/server_info"|route\("/model_info"|route\("/get_server_info"|route\("/get_model_info"|served_model_name|model_path|should_fail\(&config\)' "$f"
doneRepository: lightseekorg/smg
Length of output: 3927
Keep the TLS mock's discovery contract in sync.
The plain mock exposes /server_info, /get_server_info, /model_info, and /get_model_info endpoints (lines 90–93), with server_info_handler returning a payload that includes served_model_name (line 237). The TLS mock only exposes /get_server_info and its server_info_handler returns a different payload structure omitting served_model_name. This route and payload divergence means TLS-path tests exercise a stale worker-discovery contract and can hide regressions in model matching or registration logic.
Align model_gateway/tests/common/tls_mock_worker.rs to expose the same metadata endpoints (/server_info, /model_info, /get_model_info) and return matching server_info payloads with served_model_name.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@model_gateway/tests/common/mock_worker.rs` around lines 90 - 92, The TLS
mock's route set and server_info payload diverge from the plain mock; update
model_gateway/tests/common/tls_mock_worker.rs so its router registers the same
endpoints as the plain mock (add routes for "/server_info", "/model_info", and
"/get_model_info" alongside "/get_server_info") and make its server_info_handler
return the same JSON structure including the served_model_name field (match the
payload shape produced by the plain mock's server_info_handler) so TLS-path
tests use the identical discovery contract.
| // Should return error when no workers available | ||
| assert!( | ||
| resp.status() == StatusCode::SERVICE_UNAVAILABLE | ||
| || resp.status() == StatusCode::INTERNAL_SERVER_ERROR, | ||
| "Expected 503 or 500 when no workers available, got {}", | ||
| !resp.status().is_success(), | ||
| "Expected error status when no workers available, got {}", | ||
| resp.status() |
There was a problem hiding this comment.
Don't collapse the no-workers assertion to !is_success().
/generate is still the exception to the required-model change, so this would also pass on unrelated 400/500 regressions and stops checking the intended worker-selection contract for the no-workers path.
🎯 Tighten the assertion back to the expected status
- assert!(
- !resp.status().is_success(),
- "Expected error status when no workers available, got {}",
- resp.status()
- );
+ assert_eq!(
+ resp.status(),
+ StatusCode::NOT_FOUND,
+ "Expected 404 when no workers are registered"
+ );Based on learnings, select_worker_for_model intentionally reserves 503 for the circuit-breaker case and lets no healthy matching workers fall through to 404.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@model_gateway/tests/routing/load_balancing_test.rs` around lines 319 - 323,
The test is too loose using !resp.status().is_success(); tighten it to assert
the exact no-workers status—change the assertion in load_balancing_test.rs to
expect HTTP 404 (StatusCode::NOT_FOUND) for the no healthy-matching-worker path
so it aligns with select_worker_for_model's contract (503 reserved for
circuit-breaker); update the failure message to reflect the expected NOT_FOUND
status and keep the check using resp.status() for clarity.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4d13e64998
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| "no_available_workers", | ||
| format!("No available workers for model: {model}"), | ||
| )); | ||
| return Err(error::model_not_found(model_id)); |
There was a problem hiding this comment.
Return 503 for unavailable gRPC workers
WorkerSelectionStage::execute now unconditionally converts a failed selection into error::model_not_found (404), but selection can fail not only when the model is absent, also when matching workers exist but are temporarily unavailable after is_available() filtering. In those transient cases this incorrectly reports a client/model error and prevents retry handling (which is keyed to 5xx), so recoverable outages fail fast instead of retrying.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
model_gateway/src/routers/grpc/common/stages/worker_selection.rs (1)
85-95:⚠️ Potential issue | 🟠 MajorDon't collapse wildcard and availability failures into
model_not_found.These branches now always return
model_not_found, butselect_single_worker()/select_pd_pair()also returnNonewhen/generateis usingUNKNOWN_MODEL_IDas a wildcard or when matching healthy workers are temporarily unusable. That turns transient selection failures into false 404s and leaks"unknown"back to clients. Split true model misses from wildcard/unavailable-worker failures before choosing the response code.Based on learnings,
any_external_worker_supports_modelintentionally useshealthy_only = trueso 503 is reserved for healthy workers whose circuit breaker is open, while unhealthy workers should still fall through to 404.Also applies to: 99-113
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs` around lines 85 - 95, The current code conflates "model truly not present" with "selection failed due to wildcard/temporary unavailability" by returning model_not_found whenever select_single_worker/select_pd_pair return None; instead, call any_external_worker_supports_model(model_id, healthy_only = true) (and also with healthy_only = false if needed to detect wildcard UNKNOWN_MODEL_ID) to distinguish: if any_external_worker_supports_model returns false then return error::model_not_found(model_id); if it returns true but select_single_worker or select_pd_pair still returned None then return a 503 service/unavailable error (e.g., error::service_unavailable or similar) to represent transient/unusable-worker failures; apply the same change to both the single-worker branch (where select_single_worker is used) and the PD-pair branch (where select_pd_pair is used), referencing WorkerSelectionStage::execute, select_single_worker, select_pd_pair, and any_external_worker_supports_model to locate the logic to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@model_gateway/src/routers/http/pd_router.rs`:
- Around line 726-765: In select_pd_pair: detect when
self.worker_registry.get_by_model(model_id) returns empty for an explicit
(non-wildcard) model_id (i.e. when is_unknown_model is false) and short-circuit
by returning error::model_not_found(model_id) (or the appropriate typed
non-retriable error) instead of continuing to
pick_worker_by_policy_arc()/handle_server_selection_error; apply this check for
both the prefill_workers and decode_workers branches so unsupported explicit
models return a model_not_found error rather than bubbling as a 503.
---
Duplicate comments:
In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs`:
- Around line 85-95: The current code conflates "model truly not present" with
"selection failed due to wildcard/temporary unavailability" by returning
model_not_found whenever select_single_worker/select_pd_pair return None;
instead, call any_external_worker_supports_model(model_id, healthy_only = true)
(and also with healthy_only = false if needed to detect wildcard
UNKNOWN_MODEL_ID) to distinguish: if any_external_worker_supports_model returns
false then return error::model_not_found(model_id); if it returns true but
select_single_worker or select_pd_pair still returned None then return a 503
service/unavailable error (e.g., error::service_unavailable or similar) to
represent transient/unusable-worker failures; apply the same change to both the
single-worker branch (where select_single_worker is used) and the PD-pair branch
(where select_pd_pair is used), referencing WorkerSelectionStage::execute,
select_single_worker, select_pd_pair, and any_external_worker_supports_model to
locate the logic to change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6e733463-3e57-412e-9a81-3e1a3cd1f13b
📒 Files selected for processing (7)
crates/protocols/src/tokenize.rsmodel_gateway/src/routers/grpc/common/responses/utils.rsmodel_gateway/src/routers/grpc/common/stages/worker_selection.rsmodel_gateway/src/routers/http/pd_router.rsmodel_gateway/src/server.rsmodel_gateway/tests/api/api_endpoints_test.rsmodel_gateway/tests/common/tls_mock_worker.rs
| async fn select_pd_pair( | ||
| &self, | ||
| request_text: Option<&str>, | ||
| model_id: Option<&str>, | ||
| model_id: &str, | ||
| headers: Option<&HeaderMap>, | ||
| ) -> Result<(Arc<dyn Worker>, Arc<dyn Worker>), String> { | ||
| let effective_model_id = if self.enable_igw { model_id } else { None }; | ||
| debug!("Selecting PD pair: model_id={:?}", model_id); | ||
|
|
||
| debug!( | ||
| "Selecting PD pair: enable_igw={}, model_id={:?}, effective_model_id={:?}", | ||
| self.enable_igw, model_id, effective_model_id | ||
| ); | ||
| let is_unknown_model = model_id == UNKNOWN_MODEL_ID; | ||
|
|
||
| let prefill_workers = if let Some(model) = effective_model_id { | ||
| self.worker_registry | ||
| .get_by_model(model) | ||
| let prefill_workers = { | ||
| let by_model: Vec<_> = self | ||
| .worker_registry | ||
| .get_by_model(model_id) | ||
| .iter() | ||
| .filter(|w| matches!(w.worker_type(), WorkerType::Prefill)) | ||
| .cloned() | ||
| .collect() | ||
| } else { | ||
| self.worker_registry.get_prefill_workers() | ||
| .collect(); | ||
| if by_model.is_empty() && is_unknown_model { | ||
| // Only fall back to all workers when model is "unknown" (wildcard) | ||
| self.worker_registry.get_prefill_workers() | ||
| } else { | ||
| by_model | ||
| } | ||
| }; | ||
|
|
||
| let decode_workers = if let Some(model) = effective_model_id { | ||
| self.worker_registry | ||
| .get_by_model(model) | ||
| let decode_workers = { | ||
| let by_model: Vec<_> = self | ||
| .worker_registry | ||
| .get_by_model(model_id) | ||
| .iter() | ||
| .filter(|w| matches!(w.worker_type(), WorkerType::Decode)) | ||
| .cloned() | ||
| .collect() | ||
| } else { | ||
| self.worker_registry.get_decode_workers() | ||
| .collect(); | ||
| if by_model.is_empty() && is_unknown_model { | ||
| // Only fall back to all workers when model is "unknown" (wildcard) | ||
| self.worker_registry.get_decode_workers() | ||
| } else { | ||
| by_model | ||
| } |
There was a problem hiding this comment.
Surface explicit PD model misses as model_not_found.
The new explicit-model filtering is good, but select_pd_pair() still reports failures as Err(String). When get_by_model(model_id) is empty for an explicit model, that bubbles out through handle_server_selection_error() as 503 server_selection_failed, so unsupported models are retried like transient outages. Short-circuit explicit misses as error::model_not_found(model_id) (or another typed non-retriable error) before falling into pick_worker_by_policy_arc().
Based on learnings, 503 is reserved for healthy workers whose circuit breaker is open, while unsupported models should fall through to 404.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@model_gateway/src/routers/http/pd_router.rs` around lines 726 - 765, In
select_pd_pair: detect when self.worker_registry.get_by_model(model_id) returns
empty for an explicit (non-wildcard) model_id (i.e. when is_unknown_model is
false) and short-circuit by returning error::model_not_found(model_id) (or the
appropriate typed non-retriable error) instead of continuing to
pick_worker_by_policy_arc()/handle_server_selection_error; apply this check for
both the prefill_workers and decode_workers branches so unsupported explicit
models return a model_not_found error rather than bubbling as a 503.
4d13e64 to
59b4dc8
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 59b4dc8d03
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if !available_models.contains(&model.to_string()) { | ||
| return Some(error::service_unavailable( | ||
| "no_available_workers", | ||
| format!( | ||
| "No workers available for model '{}'. Available models: {}", | ||
| model, | ||
| available_models.join(", ") | ||
| ), | ||
| )); | ||
| return Some(error::model_not_found(model)); |
There was a problem hiding this comment.
Return 503 when responses workers are temporarily absent
validate_worker_availability now maps any missing model entry in the live registry to model_not_found (404), but this check runs before dispatch and cannot distinguish a true bad model from transient capacity loss (e.g., rolling restart, delayed registration, or full drain). In those outage windows /v1/responses will return a client error instead of a retriable 5xx, causing callers and gateways to stop retrying despite the model being valid.
Useful? React with 👍 / 👎.
| return if total.is_empty() { | ||
| error::model_not_found(model_id) | ||
| } else { |
There was a problem hiding this comment.
Avoid model-not-found for zero-worker HTTP outages
When worker selection fails, the new branch returns model_not_found if the filtered worker list is empty. This makes transient zero-worker conditions (startup race, discovery blip, or scale-to-zero window) look like a permanent client/model error, so clients that retry only on 5xx may fail fast instead of recovering once workers re-register.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
model_gateway/src/routers/http/router.rs (1)
281-301:⚠️ Potential issue | 🟠 MajorPreserve the 404/503 split here, and don't surface the
"unknown"sentinel.This fallback still counts unhealthy/stale registrations, so a model that only exists on failed health checks can return
503instead of the intended404. BecauseUNKNOWN_MODEL_IDshares the same path,/generatewithout a model can also fall through tomodel_not_found("unknown"), which leaks an internal sentinel to clients.Suggested fix
- let total = self.worker_registry.get_workers_filtered( + let healthy = self.worker_registry.get_workers_filtered( model_filter, Some(WorkerType::Regular), Some(ConnectionMode::Http), None, - false, + true, + ); + let total = self.worker_registry.get_workers_filtered( + model_filter, + Some(WorkerType::Regular), + Some(ConnectionMode::Http), + None, + false, ); - return if total.is_empty() { - error::model_not_found(model_id) + return if model_id == crate::core::UNKNOWN_MODEL_ID { + if total.is_empty() { + error::service_unavailable("no_workers", "No available workers") + } else { + error::service_unavailable( + "no_available_workers", + "All workers are unavailable", + ) + } + } else if healthy.is_empty() { + error::model_not_found(model_id) } else { error::service_unavailable( "no_available_workers", - "All workers are unavailable (circuit breaker open or unhealthy)", + "All healthy workers are unavailable", ) };Based on learnings,
any_external_worker_supports_modelintentionally useshealthy_only = trueso 503 is reserved for circuit-broken healthy workers while unhealthy-only matches should fall through to 404.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/http/router.rs` around lines 281 - 301, The code currently treats UNKNOWN_MODEL_ID as a filter and counts unhealthy/stale workers (healthy_only = false), which leaks the "unknown" sentinel and can return 503 for models that only have unhealthy registrations; fix by early-returning a 404 when model_id == crate::core::UNKNOWN_MODEL_ID (do not continue to the worker lookup) and when querying workers use self.worker_registry.get_workers_filtered(..., true) i.e. set the healthy_only flag to true so the 404/503 split is preserved; refer to model_id, UNKNOWN_MODEL_ID, get_workers_filtered, and any_external_worker_supports_model to locate the logic to change.model_gateway/src/routers/http/pd_router.rs (1)
736-766:⚠️ Potential issue | 🟠 MajorExplicit model misses still return 503 instead of 404.
The fallback logic is now correctly restricted to
UNKNOWN_MODEL_ID, which addresses the first past review concern. However, whenget_by_model(model_id)returns empty for an explicit model, the error path flows throughpick_worker_by_policy_arc→handle_server_selection_error→ 503service_unavailable.Per retrieved learnings, 503 is reserved for healthy workers whose circuit breaker is open, while unsupported models should return 404. Consider short-circuiting explicit model misses before calling
pick_worker_by_policy_arc:Suggested approach
let prefill_workers = { let by_model: Vec<_> = self .worker_registry .get_by_model(model_id) .iter() .filter(|w| matches!(w.worker_type(), WorkerType::Prefill)) .cloned() .collect(); if by_model.is_empty() && is_unknown_model { // "auto" means pick any — fall back to all prefill workers self.worker_registry.get_prefill_workers() + } else if by_model.is_empty() { + // Explicit model not found — fail fast with 404-style error + return Err(format!("model_not_found:{model_id}")); } else { by_model } };Then in
execute_dual_dispatch, check for themodel_not_found:prefix and returnerror::model_not_found(model_id)instead of callinghandle_server_selection_error.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/http/pd_router.rs` around lines 736 - 766, When an explicit model lookup yields no workers the code currently falls through to pick_worker_by_policy_arc and ultimately returns 503; instead short-circuit explicit misses and return 404. After computing prefill_workers and decode_workers (the blocks using worker_registry.get_by_model in pd_router.rs) or at the start of execute_dual_dispatch, add a check: if model_id != UNKNOWN_MODEL_ID and (prefill_workers.is_empty() || decode_workers.is_empty()) then immediately return error::model_not_found(model_id) rather than calling pick_worker_by_policy_arc / handle_server_selection_error; this ensures explicit model-not-found returns a 404.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs`:
- Line 288: The assignment let model = model_id; in worker_selection.rs is a
redundant rebinding; remove that line and use model_id directly in subsequent
metric calls (the same pattern used in select_single_worker) to avoid the no-op
variable and keep consistency — update references in the surrounding function
(worker selection logic) to use model_id instead of model.
In `@model_gateway/src/server.rs`:
- Around line 227-239: The /v1/rerank compatibility path converts the incoming
body into RerankRequest and only checks model.is_empty(), which skips the full
validation applied elsewhere (e.g., ValidatedJson<RerankRequest>) and emits a
non-standard error shape; instead, take the converted rerank_body and run it
through the same validation/error path used for ValidatedJson (validate fields
like query, documents, top_k, etc.), returning the shared bad-request JSON
structure on failure; locate the conversion of body.into() into rerank_body and
replace the lone model check by invoking the existing RerankRequest validation
routine or the same validator used earlier so all validation errors and shapes
remain consistent across /v1/rerank.
---
Duplicate comments:
In `@model_gateway/src/routers/http/pd_router.rs`:
- Around line 736-766: When an explicit model lookup yields no workers the code
currently falls through to pick_worker_by_policy_arc and ultimately returns 503;
instead short-circuit explicit misses and return 404. After computing
prefill_workers and decode_workers (the blocks using
worker_registry.get_by_model in pd_router.rs) or at the start of
execute_dual_dispatch, add a check: if model_id != UNKNOWN_MODEL_ID and
(prefill_workers.is_empty() || decode_workers.is_empty()) then immediately
return error::model_not_found(model_id) rather than calling
pick_worker_by_policy_arc / handle_server_selection_error; this ensures explicit
model-not-found returns a 404.
In `@model_gateway/src/routers/http/router.rs`:
- Around line 281-301: The code currently treats UNKNOWN_MODEL_ID as a filter
and counts unhealthy/stale workers (healthy_only = false), which leaks the
"unknown" sentinel and can return 503 for models that only have unhealthy
registrations; fix by early-returning a 404 when model_id ==
crate::core::UNKNOWN_MODEL_ID (do not continue to the worker lookup) and when
querying workers use self.worker_registry.get_workers_filtered(..., true) i.e.
set the healthy_only flag to true so the 404/503 split is preserved; refer to
model_id, UNKNOWN_MODEL_ID, get_workers_filtered, and
any_external_worker_supports_model to locate the logic to change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9847e147-6bb2-4a3c-909a-25291993e68a
📒 Files selected for processing (9)
crates/protocols/src/common.rscrates/protocols/src/tokenize.rsmodel_gateway/src/routers/grpc/common/responses/utils.rsmodel_gateway/src/routers/grpc/common/stages/worker_selection.rsmodel_gateway/src/routers/http/pd_router.rsmodel_gateway/src/routers/http/router.rsmodel_gateway/src/server.rsmodel_gateway/tests/api/api_endpoints_test.rsmodel_gateway/tests/common/tls_mock_worker.rs
| let decode_idx = policy.select_worker(&available_decode, &info)?; | ||
|
|
||
| let model = model_id.unwrap_or(UNKNOWN_MODEL_ID); | ||
| let model = model_id; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Minor: Redundant variable rebinding.
let model = model_id; is a no-op rebinding. Consider using model_id directly in the metrics calls below for consistency with select_single_worker (line 181).
♻️ Suggested simplification
- let model = model_id;
let policy_name = policy.name();
// Record worker selection metrics for both prefill and decode
Metrics::record_worker_selection(
metrics_labels::WORKER_PREFILL,
metrics_labels::CONNECTION_GRPC,
- model,
+ model_id,
policy_name,
);
Metrics::record_worker_selection(
metrics_labels::WORKER_DECODE,
metrics_labels::CONNECTION_GRPC,
- model,
+ model_id,
policy_name,
);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let model = model_id; | |
| let policy_name = policy.name(); | |
| // Record worker selection metrics for both prefill and decode | |
| Metrics::record_worker_selection( | |
| metrics_labels::WORKER_PREFILL, | |
| metrics_labels::CONNECTION_GRPC, | |
| model_id, | |
| policy_name, | |
| ); | |
| Metrics::record_worker_selection( | |
| metrics_labels::WORKER_DECODE, | |
| metrics_labels::CONNECTION_GRPC, | |
| model_id, | |
| policy_name, | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@model_gateway/src/routers/grpc/common/stages/worker_selection.rs` at line
288, The assignment let model = model_id; in worker_selection.rs is a redundant
rebinding; remove that line and use model_id directly in subsequent metric calls
(the same pattern used in select_single_worker) to avoid the no-op variable and
keep consistency — update references in the surrounding function (worker
selection logic) to use model_id instead of model.
- gRPC worker_selection: treat UNKNOWN_MODEL_ID as wildcard (any worker), return 404 model_not_found instead of 503 when no workers serve model - gRPC router: validate_worker_availability returns 404 for unsupported model - PD router: don't fall back to all workers for explicit model — only UNKNOWN_MODEL_ID triggers cross-model fallback - V1 rerank: return 400 "model_required" instead of silently picking first available model - HTTP router: distinguish 404 (model not found) from 503 (workers exist but all unavailable) using same UNKNOWN_MODEL_ID wildcard pattern - tokenize: add test_detokenize_request_requires_model - TLS mock worker: add served_model_name and /server_info route alias Signed-off-by: Simon Lin <simonslin@gmail.com> Signed-off-by: Simo Lin <linsimo.mark@gmail.com>
59b4dc8 to
8de21dc
Compare
Description
Problem
The model field handling across SMG is broken in several ways:
default_model()masks missing models —ChatCompletionRequest,ResponsesRequest, andRerankRequestdefault model to"unknown"via serde when clients omit it. This silently forwards"unknown"to backends, which either reject it (vLLM, TRT-LLM) or ignore it (SGLang), producing confusing errors instead of a clean 400.resolve_model_idis dead code — Only called forroute_generate,route_chat,route_completion, androute_messagesin IGW mode. Since serde defaults model to"unknown", model is neverNone, so the auto-selection logic never executes.Inconsistent across endpoints — Some route methods call
resolve_model_idwithif self.enable_igwbranching; others (route_responses,route_interactions,route_realtime_*) don't. No principled reason for the difference.Solution
default_model()annotations — model becomes a required JSON field. Missing model → serde deserialization error (400).resolve_model_idfrom RouterManager entirely.if self.enable_igwbranching, pass model through directly.UNKNOWN_MODEL_IDconstant only for metrics/hash ring fallback in genuinely optional cases (GenerateRequest, gRPC pipeline).Changes
default_model(),default_model_name(), and their serde annotations fromChatCompletionRequest,ResponsesRequest,RerankRequest,TokenizeRequest,DetokenizeRequest. FixInteractionsRequestDefault impl to usemodel: None.resolve_model_idmethod (~48 lines). Simplifyroute_generate,route_chat,route_completion,route_messages— removeif self.enable_igwbranching (~114 lines deleted total).Breaking Changes
Clients that omit the
modelfield from/v1/chat/completions,/v1/completions,/v1/responses,/v1/rerank,/v1/tokenize, or/v1/detokenizewill now receive a 400 Bad Request instead of having"unknown"silently injected.In practice, the only affected setup is non-IGW + HTTP + SGLang/vLLM:
resolve_tokenizerfails without a valid model)get_router_for_model), so missing model already didn't work"model": "<model-name>"to their requests.This matches the OpenAI API spec where model is a required field.
Not affected:
/generate—GenerateRequest.modelremainsOption<String>(SGLang native endpoint)/v1/interactions—InteractionsRequest.modelremainsOption<String>(model or agent fallback)/v1/realtime/*— Already had explicit validation requiring non-empty modelTest Plan
cargo test -p openai-protocol— all 36 tests passcargo test -p smg— 408 passed, 4 ignored (1 flaky perf test unrelated to changes)cargo clippy --all-targets --all-features -- -D warnings— cleanChecklist
cargo +nightly fmtpassescargo clippy --all-targets --all-features -- -D warningspassesSummary by CodeRabbit
Breaking Changes
Improvements
Tests
New Features