feat(grpc-servicer): support upstream vLLM KV event subscriptions by smfirmin · Pull Request #742 · lightseekorg/smg

smfirmin · 2026-03-12T17:44:38Z

Description

Summary

This PR adds SMG support for consuming vLLM KV cache events over SubscribeKvEvents through smg-grpc-servicer, with no source changes required in the vllm repo.

It also hardens the integration around two important edge cases:

backend restart/reconnect behavior
unsupported/null-block BlockStored layouts from vLLM

Finally, it documents the remaining protocol limitation and stabilizes one unrelated flaky routing test fixture.

What changed

Added vLLM KV event bridge support in smg-grpc-servicer so SMG can subscribe to KV cache events from upstream vLLM when vLLM is started with --kv-events-config.
Added fail-closed handling for unsupported BlockStored layouts that cannot be translated losslessly.
Mapped those unsupported layouts to gRPC FAILED_PRECONDITION instead of allowing corrupted KV state.
Updated the Rust KV event monitor to:
- detect replayed stale batches correctly
- detect a regressed first fresh batch after reconnect as a backend restart
- clear worker KV state and reconnect from sequence 0 on detected restart
- stop retrying on terminal backend responses such as UNIMPLEMENTED and FAILED_PRECONDITION
Added tests for:
- future requested sequence replay behavior
- unsupported layout fail-closed behavior
- restart detection in the KV monitor
- stale replay skipping in the KV monitor
Added operator docs explaining:
- upstream vLLM works as-is
- --kv-events-config is required for KV event streaming
- current residual protocol limitations
Stabilized test_openai_router_circuit_breaker by replacing an environment-dependent invalid hostname with a deterministic closed localhost port.

Upstream vLLM compatibility

This works with upstream vLLM as-is. No vllm code changes are required.

To enable KV event streaming, run the stock vLLM gRPC server with --kv-events-config.

Residual limitation

Restart recovery is currently sequence-based.

SMG can detect a restart when the first fresh batch after reconnect has a lower sequence number than the last applied batch. If a restarted backend catches back up before SMG reconnects, the current protocol has no producer epoch/generation identifier, so that restart is not distinguishable from normal continuation.

Unsupported null-block/sparse BlockStored layouts also remain fail-closed until the stream carries enough per-block token range metadata for lossless translation.

Testing

Ran:

python -m compileall grpc_servicer/smg_grpc_servicer/vllm/kv_events.py
grpc_servicer/smg_grpc_servicer/vllm/servicer.py
cargo +nightly fmt --all --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test kv_event_monitor
cargo test -p smg --test routing_tests
routing::test_openai_routing::test_openai_router_circuit_breaker -- --exact
--nocapture

Summary by CodeRabbit

New Features
- Added KV event streaming with a Subscribe endpoint to stream and replay KV cache events for cache-aware routing.
Documentation
- Added guides, examples, and usage notes for the --kv-events-config flag, including limitations and when it’s required.
Bug Fixes
- Improved restart detection and handling to avoid applying stale batches and recover state correctly.
Tests
- Added tests for streaming, replay, multi-rank sequencing, and layout validation.
Chores
- Added runtime dependencies needed for KV event streaming.

coderabbitai · 2026-03-12T17:45:02Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds documentation and dependencies; implements a new VllmKvEventBridge (ZeroMQ → gRPC) with per-rank replay, buffering, and sequence normalization; exposes SubscribeKvEvents RPC and servicer integration; adds async Python tests; and updates Rust kv_event_monitor stream-processing to detect restarts/gaps and handle specific gRPC statuses.

Changes

Cohort / File(s)	Summary
Documentation `docs/getting-started/grpc-workers.md`, `docs/getting-started/index.md`, `grpc_servicer/README.md`	Add `--kv-events-config` usage, examples, when it’s required, and limitations (restart/recovery semantics, BlockStored layout constraints).
Dependencies `grpc_servicer/pyproject.toml`	Add `msgspec` and `pyzmq` dependencies for ZeroMQ messaging and message handling used by the KV event bridge.
KV Event Bridge `grpc_servicer/smg_grpc_servicer/vllm/kv_events.py`	New module implementing `VllmKvEventBridge` and `UnsupportedKvEventLayoutError`. Translates vLLM KV events to `KvCacheEvent`, supports per-rank REQ/REP replay + SUB streaming, in-process buffering, sequence normalization, and async `subscribe(start_sequence_number)`.
Servicer Integration `grpc_servicer/smg_grpc_servicer/vllm/servicer.py`	Integrate `VllmKvEventBridge` into `VllmEngineServicer`, start/shutdown lifecycle, add `SubscribeKvEvents` RPC and error handling paths.
KV Event Tests (Python) `grpc_servicer/tests/test_kv_events.py`	Add async tests covering live replay, multi-rank global sequencing, head-replay behavior, and unsupported-layout error cases; includes publisher scaffolding and cleanup.
Stream Monitoring (Rust) `model_gateway/src/core/kv_event_monitor.rs`	Change stream error payload to `tonic::Status`; introduce `BatchResult` and `handle_batch` for per-batch decisions; add restart detection, awaiting-first-fresh-batch logic, and special-case handling for `Unimplemented`/`FailedPrecondition`. Add tests for restart and replay behavior.
Test Update `model_gateway/tests/routing/test_openai_routing.rs`	Replace an obviously-invalid URL with an unresponsive real address (`http://127.0.0.1:1`) in a circuit-breaker test to preserve failure semantics.

Sequence Diagram(s)

sequenceDiagram
    participant vllm as Upstream vLLM
    participant zmq as ZeroMQ Publisher
    participant bridge as VllmKvEventBridge
    participant grpc as gRPC Servicer
    participant client as Client
    participant gateway as Model Gateway

    vllm->>zmq: Publish KV events (BlockStored / BlockRemoved / AllBlocksCleared)
    zmq->>bridge: SUB stream (per-rank), REQ/REP for replay
    bridge->>bridge: Translate → KvCacheEvent, assign global seq, buffer
    client->>grpc: SubscribeKvEvents(start_sequence)
    grpc->>bridge: subscribe(start_sequence)
    bridge-->>grpc: Replay buffered batches and stream new batches
    grpc-->>client: Stream KvEventBatch responses
    bridge->>gateway: Surface restart/gap signals
    gateway->>gateway: handle_batch → apply / skip / reset state

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat(grpc): add subscribe_kv_events to all backend clients #558: Adds client-side subscribe_kv_events wiring complementary to the server-side SubscribeKvEvents implementation.
fix(kv): stop retrying SubscribeKvEvents when backend returns UNIMPLEMENTED #577: Related changes to kv_event_monitor.rs for handling tonic::Status codes and disabling retry behavior on specific gRPC statuses.
feat(kv-events): learn block_size from KV event stream #677: Overlapping modifications to kv_event_monitor.rs stream-processing and restart/gap detection logic.

Suggested labels

grpc

Suggested reviewers

CatherineSue
slin1237

Poem

🐰 I hopped along the ZeroMQ stream,
Turned blocks to protobuf with a gleam,
Sequenced ranks in tidy rows,
Replay and restart — the pipeline flows,
A rabbit cheers the KV dream.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.71% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding support for vLLM KV event subscriptions in the gRPC servicer, which is the central feature across multiple files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-12T17:45:03Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the smg-grpc-servicer's capability by integrating with vLLM's KV cache event streaming. It introduces a robust mechanism to monitor and react to KV cache changes, improving the system's resilience to backend restarts and ensuring data integrity by gracefully handling unsupported data formats. These changes pave the way for more advanced cache-aware routing and monitoring features within SMG.

Highlights

vLLM KV Event Bridge: Added support for consuming vLLM KV cache events over SubscribeKvEvents through smg-grpc-servicer, requiring no source changes in the vLLM repository. This enables SMG to subscribe to KV cache events when vLLM is started with --kv-events-config.
Robustness Improvements: Hardened the integration around backend restart/reconnect behavior and unsupported/null-block BlockStored layouts from vLLM. Unsupported layouts now map to gRPC FAILED_PRECONDITION instead of allowing corrupted KV state.
KV Event Monitor Enhancements: Updated the Rust KV event monitor to correctly detect replayed stale batches, identify a regressed first fresh batch after reconnect as a backend restart, clear worker KV state and reconnect from sequence 0 on detected restarts, and stop retrying on terminal backend responses like UNIMPLEMENTED and FAILED_PRECONDITION.
New Tests: Added comprehensive tests for future requested sequence replay behavior, unsupported layout fail-closed behavior, restart detection in the KV monitor, and stale replay skipping in the KV monitor.
Documentation Updates: Included operator documentation explaining that upstream vLLM works as-is, --kv-events-config is required for KV event streaming, and detailing current residual protocol limitations.
Test Stabilization: Stabilized the test_openai_router_circuit_breaker by replacing an environment-dependent invalid hostname with a deterministic closed localhost port.

Changelog

docs/getting-started/grpc-workers.md
- Updated vLLM gRPC server command example to include --kv-events-config.
- Added documentation explaining the purpose of --kv-events-config and current limitations regarding restart recovery and BlockStored layouts.
docs/getting-started/index.md
- Updated vLLM gRPC server command example to include --kv-events-config.
- Added documentation explaining the purpose of --kv-events-config and current limitations regarding restart recovery and BlockStored layouts.
grpc_servicer/README.md
- Added a new section "Upstream vLLM and KV Event Streaming" with setup instructions and details on current limitations.
grpc_servicer/pyproject.toml
- Added msgspec and pyzmq as new project dependencies.
grpc_servicer/smg_grpc_servicer/vllm/kv_events.py
- Added VllmKvEventBridge class to bridge vLLM's internal KV cache event publisher onto the SMG gRPC proto.
- Implemented logic for connecting ZMQ endpoints, chunking token IDs, and translating vLLM KV events to gRPC KvCacheEvents.
- Included error handling for UnsupportedKvEventLayoutError when vLLM KV events cannot be translated losslessly.
grpc_servicer/smg_grpc_servicer/vllm/servicer.py
- Imported VllmKvEventBridge and UnsupportedKvEventLayoutError.
- Updated VllmEngineServicer to include a new SubscribeKvEvents RPC method.
- Initialized and started VllmKvEventBridge within the servicer's constructor.
- Added a shutdown method to gracefully stop the KV event bridge.
- Implemented SubscribeKvEvents to stream KV cache events, handling UnsupportedKvEventLayoutError by aborting with FAILED_PRECONDITION.
grpc_servicer/tests/test_kv_events.py
- Added unit tests for VllmKvEventBridge to verify replay and live event streaming.
- Included tests for global sequence number assignment across data parallel ranks.
- Added tests to ensure correct behavior when a requested sequence number is in the future.
- Implemented tests for fail-closed behavior on unsupported BlockStored layouts.
model_gateway/src/core/kv_event_monitor.rs
- Modified StreamResult::Error to accept tonic::Status for more precise error handling.
- Introduced StreamResult::RestartDetected and BatchResult enum for detailed batch processing outcomes.
- Updated stream processing to stop retrying on UNIMPLEMENTED or FAILED_PRECONDITION gRPC codes.
- Implemented logic to detect backend restarts based on sequence number regression and clear worker KV state.
- Enhanced batch handling to robustly skip stale or duplicate batches.
model_gateway/tests/routing/test_openai_routing.rs
- Stabilized test_openai_router_circuit_breaker by replacing a non-deterministic invalid hostname with a deterministic closed localhost port.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for consuming vLLM KV cache events, a significant feature that enables more advanced cache-aware routing. The implementation includes a Python-based bridge to consume ZMQ events from vLLM and expose them over gRPC, as well as hardening of the Rust-based consumer to handle backend restarts and unsupported event layouts. The changes are well-structured, include comprehensive tests for the new functionality, and provide clear documentation for operators. My review found one opportunity for improvement to enhance robustness in the event translation logic.

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@grpc_servicer/smg_grpc_servicer/vllm/kv_events.py`:
- Line 1: The module-level pragma "# mypy: ignore-errors" in
grpc_servicer.smg_grpc_servicer.vllm.kv_events disables type checking for the
whole file; remove that line and instead add targeted "# type: ignore" comments
only on the specific imports, calls, or expressions that fail mypy (for example
problematic imports or functions inside kv_events such as any external-client
initializations or dynamic attribute uses), run mypy to see remaining errors,
and progressively add precise ignores or proper type annotations (or stubs) for
the identified symbols to restore type checks for the rest of the module.
- Line 82: The loop pairing event.block_hashes with token_chunks should use
strict validation to detect mismatches; update the zip call in the loop that
iterates over event.block_hashes and token_chunks (the one consuming the output
of _chunk_token_ids) to pass strict=True so any unexpected length mismatch
raises an error instead of silently truncating.

In `@grpc_servicer/smg_grpc_servicer/vllm/servicer.py`:
- Around line 349-350: The servicer.shutdown() method (which awaits
kv_event_bridge.shutdown()) is never invoked during server teardown, causing the
bridge task and ZMQ sockets to leak; update the gRPC server lifecycle (e.g., in
the function that starts/awaits the server such as your server serve/start
function or main) to call and await servicer.shutdown() in the shutdown
path—wrap the server run in try/finally or register a graceful shutdown callback
so that await servicer.shutdown() runs when stopping the server (ensure you
reference servicer.shutdown() and kv_event_bridge.shutdown() and await the call
to fully close resources).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 67a9c6e8-37b6-46b4-8f6c-d8a31d7c3645

📥 Commits

Reviewing files that changed from the base of the PR and between 57a9db1 and 05997b6.

📒 Files selected for processing (9)

docs/getting-started/grpc-workers.md
docs/getting-started/index.md
grpc_servicer/README.md
grpc_servicer/pyproject.toml
grpc_servicer/smg_grpc_servicer/vllm/kv_events.py
grpc_servicer/smg_grpc_servicer/vllm/servicer.py
grpc_servicer/tests/test_kv_events.py
model_gateway/src/core/kv_event_monitor.rs
model_gateway/tests/routing/test_openai_routing.rs

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py

grpc_servicer/smg_grpc_servicer/vllm/servicer.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05997b645f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@grpc_servicer/smg_grpc_servicer/vllm/kv_events.py`:
- Around line 258-263: The code silently drops batches when a publisher's
per-rank sequence resets because rank_seq <= self._last_rank_sequence.get(rank,
-1) simply continues; update the logic in the block around rank_seq,
_last_rank_sequence, _decoder.decode and _append_batch to detect sequence
regression (rank_seq < last_seq) and handle it explicitly: either reset the
stored state for that rank (e.g., delete or set last_seq to -1 and replay/apply
the incoming batch) or raise/propagate an error/notification so the bridge can
fail-closed or request replay. Ensure the chosen behavior updates
_last_rank_sequence appropriately after calling _append_batch (or before if
replaying) and does not silently drop valid post-restart batches.
- Around line 188-203: The wait predicate currently treats a caught-up
subscriber (next_seq == self._next_sequence_number) as needing replay, causing
_normalize_requested_sequence_locked to rewind and the loop to resend the full
buffer; change the predicate to only consider next_seq beyond the known next
sequence (use next_seq > self._next_sequence_number instead of >=) or otherwise
ensure equality does not make the predicate true, and also ensure
_normalize_requested_sequence_locked does not rewind when next_seq ==
self._next_sequence_number (only rewind when requested sequence is strictly less
than the oldest allowed). Update the expressions using self._buffer,
self._next_sequence_number, next_seq and _normalize_requested_sequence_locked
accordingly so caught-up subscribers simply wait for new data.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2b08c8b9-cf24-4955-8f2e-fe3574bf0ef0

📥 Commits

Reviewing files that changed from the base of the PR and between 05997b6 and 9b53998.

📒 Files selected for processing (1)

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py

coderabbitai

♻️ Duplicate comments (1)

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py (1)

318-333: ⚠️ Potential issue | 🔴 Critical

Fail closed on rank-sequence regression instead of appending over stale history.

When rank_seq < last_rank_seq, this path only logs and then appends the new batch into the existing _buffer. Active subscribers therefore continue from a mixed pre-restart + post-restart history, which can reconstruct the wrong KV state unless the first post-restart batch fully clears everything. This needs a reset/fatal path, not ordinary ingestion.

🛡️ Minimal safe fix

     async def _ingest_rank_batch(self, rank: int, rank_seq: int, payload: bytes) -> None:
         last_rank_seq = self._last_rank_sequence.get(rank, -1)
         if rank_seq == last_rank_seq:
             return
         if rank_seq < last_rank_seq:
-            logger.warning(
-                "KV event publisher sequence regressed for rank %s: previous=%s received=%s. "
-                "Assuming publisher restart and accepting new batches from this rank",
-                rank,
-                last_rank_seq,
-                rank_seq,
-            )
+            raise RuntimeError(
+                "KV event publisher sequence regressed for rank "
+                f"{rank}: previous={last_rank_seq} received={rank_seq}"
+            )
 
         batch = self._decoder.decode(payload)
         await self._append_batch(batch)
         self._last_rank_sequence[rank] = rank_seq

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@grpc_servicer/smg_grpc_servicer/vllm/kv_events.py` around lines 318 - 333,
The code currently treats a rank_seq regression in _ingest_rank_batch as
non-fatal and continues appending, which can corrupt state; change the
regression path to fail-closed by logging an error and raising an exception
(e.g., RuntimeError) instead of decoding/appending. Specifically, in
_ingest_rank_batch detect rank_seq < last_rank_seq and call logger.error with
context (rank, last_rank_seq, rank_seq) then raise a fatal error so callers stop
processing; do not call self._decoder.decode, self._append_batch, or update
self._last_rank_sequence for that case. This uses the existing symbols
_ingest_rank_batch, _last_rank_sequence, _decoder, and _append_batch to locate
and change the logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@grpc_servicer/smg_grpc_servicer/vllm/kv_events.py`:
- Around line 318-333: The code currently treats a rank_seq regression in
_ingest_rank_batch as non-fatal and continues appending, which can corrupt
state; change the regression path to fail-closed by logging an error and raising
an exception (e.g., RuntimeError) instead of decoding/appending. Specifically,
in _ingest_rank_batch detect rank_seq < last_rank_seq and call logger.error with
context (rank, last_rank_seq, rank_seq) then raise a fatal error so callers stop
processing; do not call self._decoder.decode, self._append_batch, or update
self._last_rank_sequence for that case. This uses the existing symbols
_ingest_rank_batch, _last_rank_sequence, _decoder, and _append_batch to locate
and change the logic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a7a01521-c1b4-4eea-b2e2-4f863ea99ecb

📥 Commits

Reviewing files that changed from the base of the PR and between 9b53998 and 04affa6.

📒 Files selected for processing (2)

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py
grpc_servicer/tests/test_kv_events.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 04affa6524

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

model_gateway/src/core/kv_event_monitor.rs

mergify · 2026-03-12T22:14:11Z

Hi @smfirmin, this PR has merge conflicts that must be resolved before it can be merged. Please rebase your branch:

git fetch origin main
git rebase origin/main
# resolve any conflicts, then:
git push --force-with-lease

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@grpc_servicer/pyproject.toml`:
- Around line 14-15: The dependency "msgspec" in pyproject.toml has no version
constraint; update the dependency entry for msgspec to include a minimum version
(e.g., "msgspec>=X.Y.Z") to match the style used for "pyzmq>=25.0.0" and prevent
accidental incompatible upgrades—check the latest stable msgspec release and set
that as the minimum, then run your dependency install/check to verify
compatibility.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 11d8a066-b72f-43a4-8126-f94632f9ebab

📥 Commits

Reviewing files that changed from the base of the PR and between 04affa6 and a8a8bf9.

📒 Files selected for processing (2)

grpc_servicer/README.md
grpc_servicer/pyproject.toml

grpc_servicer/pyproject.toml

mergify · 2026-03-12T22:46:22Z

Hi @smfirmin, the DCO sign-off check has failed. All commits must include a Signed-off-by line.

To fix existing commits:

# Sign off the last N commits (replace N with the number of unsigned commits)
git rebase HEAD~N --signoff
git push --force-with-lease

To sign off future commits automatically:

Use git commit -s every time, or
VSCode: enable Git: Always Sign Off in Settings
PyCharm: enable Sign-off commit in the Commit tool window

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@grpc_servicer/pyproject.toml`:
- Around line 14-15: The TOML dependency list has a syntax error: the string for
the msgspec dependency is missing its closing double quote; locate the
dependency entry containing "msgspec>=0.20.0, and add the missing closing quote
so it becomes a valid TOML string (matching the format used for
"pyzmq>=25.0.0"), ensuring commas and quotes are correct for all dependency
lines.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 68321d8b-c2d7-443a-b2ae-3c41432e9b21

📥 Commits

Reviewing files that changed from the base of the PR and between a8a8bf9 and 16658c7.

📒 Files selected for processing (1)

grpc_servicer/pyproject.toml

grpc_servicer/pyproject.toml

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16658c7646

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

grpc_servicer/pyproject.toml

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 21f18509d3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

grpc_servicer/smg_grpc_servicer/vllm/servicer.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec068062b3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-13T20:41:23Z

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py

+                next_seq,
+                oldest,
+            )
+            return oldest


Return terminal error for expired replay sequence requests

When a client asks for a sequence older than the replay buffer, this branch rewinds to oldest and silently drops the missing range instead of surfacing a reset/error. In the current monitor flow, that produces a permanent reconnect loop: handle_batch reports a gap (received > last_seq + 1), subscription_loop reconnects with the same last_seq, and the bridge rewinds again, so KV updates never recover after a long disconnect or backlog overrun.

Useful? React with 👍 / 👎.

model_gateway/src/core/kv_event_monitor.rs

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be03d6d1fd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py

mergify · 2026-03-20T21:43:01Z

Hi @smfirmin, the DCO sign-off check has failed. All commits must include a Signed-off-by line.

To fix existing commits:

# Sign off the last N commits (replace N with the number of unsigned commits)
git rebase HEAD~N --signoff
git push --force-with-lease

To sign off future commits automatically:

Use git commit -s every time, or
VSCode: enable Git: Always Sign Off in Settings
PyCharm: enable Sign-off commit in the Commit tool window

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ebf18881a0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py

model_gateway/src/core/kv_event_monitor.rs

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a2d4535039

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py

Signed-off-by: Sydney Firmin <sydney.firmin@oracle.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: db41cd2208

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py

smfirmin requested review from CatherineSue and slin1237 as code owners March 12, 2026 17:44

gemini-code-assist bot reviewed Mar 12, 2026

View reviewed changes

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py Outdated Show resolved Hide resolved

github-actions bot added documentation Improvements or additions to documentation dependencies Dependency updates tests Test changes model-gateway Model gateway crate changes labels Mar 12, 2026

coderabbitai bot requested changes Mar 12, 2026

View reviewed changes

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py Outdated Show resolved Hide resolved

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py Outdated Show resolved Hide resolved

grpc_servicer/smg_grpc_servicer/vllm/servicer.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py Show resolved Hide resolved

coderabbitai bot requested changes Mar 12, 2026

View reviewed changes

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py Show resolved Hide resolved

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 12, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

model_gateway/src/core/kv_event_monitor.rs Show resolved Hide resolved

mergify bot added needs-rebase PR has merge conflicts that need to be resolved and removed needs-rebase PR has merge conflicts that need to be resolved labels Mar 12, 2026

coderabbitai bot requested changes Mar 12, 2026

View reviewed changes

grpc_servicer/pyproject.toml Outdated Show resolved Hide resolved

coderabbitai bot requested changes Mar 12, 2026

View reviewed changes

grpc_servicer/pyproject.toml Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 12, 2026

View reviewed changes

grpc_servicer/pyproject.toml Outdated Show resolved Hide resolved

smfirmin force-pushed the feat/vllm-subscribe-kv-events branch from 16658c7 to d482fb9 Compare March 12, 2026 22:51

coderabbitai bot approved these changes Mar 12, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 13, 2026

View reviewed changes

grpc_servicer/smg_grpc_servicer/vllm/servicer.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 13, 2026

View reviewed changes

smfirmin changed the title ~~feat(grpc-servicer): support upstream vLLM KV event subscriptions~~ [WIP]feat(grpc-servicer): support upstream vLLM KV event subscriptions Mar 19, 2026

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py Show resolved Hide resolved

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py Show resolved Hide resolved

smfirmin requested review from XinyueZhang369, gongwei-130 and key4ng as code owners March 20, 2026 21:42

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py Show resolved Hide resolved

model_gateway/src/core/kv_event_monitor.rs Show resolved Hide resolved

smfirmin force-pushed the feat/vllm-subscribe-kv-events branch from 4af2c0b to a2d4535 Compare March 20, 2026 23:14

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py Show resolved Hide resolved

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py Show resolved Hide resolved

feat(grpc-servicer): add vLLM KV event subscription support

50ffe81

Signed-off-by: Sydney Firmin <sydney.firmin@oracle.com>

smfirmin force-pushed the feat/vllm-subscribe-kv-events branch from a2d4535 to 50ffe81 Compare March 20, 2026 23:32

smfirmin changed the title ~~[WIP]feat(grpc-servicer): support upstream vLLM KV event subscriptions~~ feat(grpc-servicer): support upstream vLLM KV event subscriptions Mar 20, 2026

Merge branch 'main' into feat/vllm-subscribe-kv-events

db41cd2

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

grpc_servicer/smg_grpc_servicer/vllm/kv_events.py Show resolved Hide resolved

Conversation

smfirmin commented Mar 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary

What changed

Upstream vLLM compatibility

Residual limitation

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Mar 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mergify bot commented Mar 12, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Mar 12, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

smfirmin commented Mar 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 12, 2026 •

edited

Loading