Skip to content

feat(core): wire per-worker resilience and HTTP client into BasicWorker#803

Merged
CatherineSue merged 2 commits intomainfrom
feat/per-worker-resilience-wiring
Mar 18, 2026
Merged

feat(core): wire per-worker resilience and HTTP client into BasicWorker#803
CatherineSue merged 2 commits intomainfrom
feat/per-worker-resilience-wiring

Conversation

@CatherineSue
Copy link
Collaborator

@CatherineSue CatherineSue commented Mar 18, 2026

Description

Problem

PR #799 added the foundational types (ResolvedResilience, HttpPoolConfig, ResilienceUpdate, build_worker_http_client) but workers don't use them yet. BasicWorker has no per-worker HTTP client or resilience config — all workers still share the global client and router-level retry/CB settings.

Solution

Wire the new types into BasicWorker, the Worker trait, and all worker creation paths so that every worker constructed via the registration API gets its own resolved resilience config and isolated HTTP connection pool at construction time.

Changes

  • Add resilience(), http_client(), is_retryable() methods to the Worker trait
  • Add http_client and resilience fields to BasicWorker struct
  • Add http_client() and resilience() builder methods to BasicWorkerBuilder
  • Wire resolve_resilience() and build_worker_http_client() into local worker creation (CreateLocalWorkerStep)
  • Wire same into external worker creation (CreateExternalWorkersStep)
  • Preserve resilience/http_client during worker property updates (UpdateWorkerPropertiesStep)
  • Update GrpcWorker in golang bindings to implement new trait methods (add reqwest dep)

Test Plan

  • cargo test -p smg --lib — all 435 tests pass, 0 failures
  • cargo check -p smg-golang — golang bindings compile cleanly
  • Pre-commit hooks pass (rustfmt, clippy, codespell, DCO)
  • Workers without overrides use router defaults (backwards compatible)
  • Workers with resilience/http_pool in WorkerSpec get per-worker config
Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

  • New Features
    • Added per-worker resilience settings (retry/circuit-breaker) to improve fault tolerance.
    • Every worker now has a configurable HTTP client for external communications.
    • Worker builders and updates carry resilience and HTTP client configuration through re-registration.
    • New builder options allow supplying custom worker resilience and HTTP client instances.

Summary by CodeRabbit

Add http_client and resilience fields to BasicWorker and the Worker
trait. Workers now own their resolved resilience config and an isolated
HTTP connection pool, constructed at registration time.

- Add resilience(), http_client(), is_retryable() to Worker trait
- Add http_client/resilience builder methods to BasicWorkerBuilder
- Wire resolve_resilience() and build_worker_http_client() into local
  and external worker creation steps
- Preserve resilience/http_client during worker property updates
- Update GrpcWorker in golang bindings to implement new trait methods

Signed-off-by: Chang Su <chang.s.su@oracle.com>
@github-actions github-actions bot added dependencies Dependency updates model-gateway Model gateway crate changes labels Mar 18, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 18, 2026

📝 Walkthrough

Walkthrough

Adds per-worker resilience and an HTTP client across worker types: new reqwest dependency in Golang bindings; Worker trait and BasicWorker gain resilience and http_client; builders and worker-creation steps resolve resilience and construct per-worker HTTP clients.

Changes

Cohort / File(s) Summary
Golang bindings
bindings/golang/Cargo.toml, bindings/golang/src/policy.rs
Add reqwest = { version = "0.12", default-features = false }; extend GrpcWorker with http_client: reqwest::Client and resilience: ResolvedResilience, plus accessors.
Core worker API & impls
model_gateway/src/core/worker.rs, model_gateway/src/core/worker_builder.rs
Add resilience() and http_client() to Worker trait with a default is_retryable; add pub http_client and pub resilience to BasicWorker; add builder fields/methods and defaults for http_client and resilience.
Local worker creation & update
model_gateway/src/core/steps/worker/local/create_worker.rs, model_gateway/src/core/steps/worker/local/update_worker_properties.rs
Resolve per-worker resilience and build per-worker HTTP client during local worker creation; propagate http_client and resilience through worker updates and builder chain; map HTTP client creation errors to workflow errors.
External worker creation
model_gateway/src/core/steps/worker/external/create_workers.rs
Resolve resilience and build per-worker HTTP client before constructing external worker builders; pass http_client and resilience into wildcard and discovered worker builders.

Sequence Diagram(s)

sequenceDiagram
    participant Step as CreateWorkerStep
    participant Resolver as resolve_resilience
    participant HTTPBuilder as build_worker_http_client
    participant Builder as BasicWorkerBuilder
    participant Worker as BasicWorker

    Step->>Resolver: compute resolved_resilience(base_retry, base_cb, flags)
    Resolver-->>Step: ResolvedResilience + circuit_breaker_config
    Step->>HTTPBuilder: build_worker_http_client(worker_id, settings)
    HTTPBuilder-->>Step: reqwest::Client (or error)
    Step->>Builder: builder.resilience(resolved_resilience).http_client(client)
    Builder->>Worker: build()
    Worker-->>Step: BasicWorker { http_client, resilience, ... }
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

protocols, grpc

Suggested reviewers

  • whybeyoung
  • key4ng
  • slin1237

Poem

🐰 A little hop for each worker I bring,
I carry a client and resilience string,
Requests retry kindly, circuits behave,
Builders assemble — each worker's brave,
Hooray for hops, and resilient spring!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately reflects the main change: wiring per-worker resilience and HTTP client into BasicWorker across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/per-worker-resilience-wiring
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates per-worker resilience configurations and isolated HTTP clients into the BasicWorker and its creation/update workflows. This ensures that each worker can have its own retry and circuit breaker settings, along with a dedicated HTTP connection pool, moving away from shared global configurations.

Highlights

  • Worker Trait Extension: The Worker trait now includes resilience(), http_client(), and is_retryable() methods, allowing workers to expose their specific resilience configurations and HTTP clients.
  • BasicWorker Enhancement: The BasicWorker struct has been updated to store its own http_client (a reqwest::Client) and resilience (a ResolvedResilience struct), enabling per-worker isolation.
  • Builder Integration: The BasicWorkerBuilder now provides http_client() and resilience() methods, allowing these configurations to be set during worker construction.
  • Workflow Integration: Worker creation steps (CreateLocalWorkerStep, CreateExternalWorkersStep) now utilize resolve_resilience() and build_worker_http_client() to provision workers with their unique resilience and HTTP client settings.
  • Property Preservation: The UpdateWorkerPropertiesStep has been modified to ensure that the http_client and resilience configurations are correctly preserved when worker properties are updated.
  • Golang Bindings Update: The GrpcWorker in the Golang bindings has been updated to implement the new Worker trait methods and now includes the reqwest dependency.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request wires up per-worker resilience configurations and dedicated HTTP clients, which is a great step towards more granular control and isolation. The implementation is solid, covering creation and update paths for both local and external workers. I've found a couple of areas for improvement related to error handling to make the implementation more robust. Specifically, I've suggested using a more appropriate error type when HTTP client creation fails in one of the workflow steps, and making client creation failures in the builder explicit instead of silently falling back to defaults. Overall, these are great changes.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e9f981bb86

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@model_gateway/src/core/steps/worker/external/create_workers.rs`:
- Around line 68-69: The call that constructs the HTTP client using
build_worker_http_client currently maps errors to
WorkflowError::ContextValueNotFound; change this to map to
WorkflowError::StepFailed with a clear message indicating HTTP client
construction failed and include the underlying error (e.g., via format! or
chain) so the original error is preserved — update the map_err on
build_worker_http_client(...) to return WorkflowError::StepFailed("failed to
build worker HTTP client: ...", Box::new(err)) or equivalent to match the enum
variant and existing error-wrapping pattern used elsewhere (see usages around
create_worker.rs functions).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 389b3ca1-60de-42d4-8d5b-ca3d3873a496

📥 Commits

Reviewing files that changed from the base of the PR and between ebdf26b and e9f981b.

📒 Files selected for processing (7)
  • bindings/golang/Cargo.toml
  • bindings/golang/src/policy.rs
  • model_gateway/src/core/steps/worker/external/create_workers.rs
  • model_gateway/src/core/steps/worker/local/create_worker.rs
  • model_gateway/src/core/steps/worker/local/update_worker_properties.rs
  • model_gateway/src/core/worker.rs
  • model_gateway/src/core/worker_builder.rs

Use WorkflowError::StepFailed instead of ContextValueNotFound when
build_worker_http_client fails in external worker creation.

Signed-off-by: Chang Su <chang.s.su@oracle.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
model_gateway/src/core/steps/worker/external/create_workers.rs (1)

101-107: 🧹 Nitpick | 🔵 Trivial

Extract the shared builder wiring before it drifts.

The wildcard and discovered-model branches now duplicate the same new http_client/resilience plumbing. Pulling the common BasicWorkerBuilder setup into a helper or local closure would make future per-worker additions much harder to miss in one path.

Also applies to: 141-148

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/core/steps/worker/external/create_workers.rs` around lines
101 - 107, The two branches duplicate the same BasicWorkerBuilder wiring
(new(normalized_url.clone()) plus .worker_type(...), .connection_mode(...),
.runtime_type(...), .circuit_breaker_config(...),
.http_client(http_client.clone()), .resilience(resolved_resilience.clone()));
extract that common setup into a small helper function or local closure (e.g.,
make_worker_builder or build_base_worker) that accepts normalized_url (or &str)
and returns a configured BasicWorkerBuilder, then call that helper from both the
wildcard and discovered-model branches before applying branch-specific
modifications; update uses of BasicWorkerBuilder::new and the chained setters to
use the helper to avoid duplication.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@model_gateway/src/core/steps/worker/external/create_workers.rs`:
- Around line 101-107: The two branches duplicate the same BasicWorkerBuilder
wiring (new(normalized_url.clone()) plus .worker_type(...),
.connection_mode(...), .runtime_type(...), .circuit_breaker_config(...),
.http_client(http_client.clone()), .resilience(resolved_resilience.clone()));
extract that common setup into a small helper function or local closure (e.g.,
make_worker_builder or build_base_worker) that accepts normalized_url (or &str)
and returns a configured BasicWorkerBuilder, then call that helper from both the
wildcard and discovered-model branches before applying branch-specific
modifications; update uses of BasicWorkerBuilder::new and the chained setters to
use the helper to avoid duplication.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f1bacbaa-c5bf-4b4e-a90d-0b3817f42cc3

📥 Commits

Reviewing files that changed from the base of the PR and between e9f981b and f6e6b77.

📒 Files selected for processing (1)
  • model_gateway/src/core/steps/worker/external/create_workers.rs

@CatherineSue CatherineSue merged commit 924d10a into main Mar 18, 2026
31 of 35 checks passed
@CatherineSue CatherineSue deleted the feat/per-worker-resilience-wiring branch March 18, 2026 22:17
smfirmin pushed a commit to smfirmin/smg that referenced this pull request Mar 20, 2026
smfirmin pushed a commit to smfirmin/smg that referenced this pull request Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Dependency updates model-gateway Model gateway crate changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants