Skip to content

feat(cache): add in flight deduping#4459

Merged
NathanFlurry merged 1 commit intomainfrom
03-19-feat_cache_add_in_flight_deduping
Apr 5, 2026
Merged

feat(cache): add in flight deduping#4459
NathanFlurry merged 1 commit intomainfrom
03-19-feat_cache_add_in_flight_deduping

Conversation

@MasterPtato
Copy link
Copy Markdown
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4459 March 19, 2026 20:50 Destroyed
@railway-app
Copy link
Copy Markdown

railway-app bot commented Mar 19, 2026

🚅 Deployed to the rivet-pr-4459 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-cloud 😴 Sleeping (View Logs) Web Apr 4, 2026 at 12:38 am
frontend-inspector 😴 Sleeping (View Logs) Web Mar 24, 2026 at 1:03 am
website ❌ Build Failed (View Logs) Web Mar 19, 2026 at 8:53 pm
mcp-hub ✅ Success (View Logs) Web Mar 19, 2026 at 8:52 pm
ladle ❌ Build Failed (View Logs) Web Mar 19, 2026 at 8:51 pm

@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch from 175706a to 17d21f5 Compare March 19, 2026 20:53
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from 75e1e38 to bc5d6d3 Compare March 19, 2026 20:53
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Mar 19, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4459

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4459

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4459

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4459

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4459

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4459

@rivetkit/sqlite-vfs

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sqlite-vfs@4459

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4459

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4459

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4459

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4459

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4459

commit: 6570bf2

@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

Code Review: feat(cache): add in flight deduping

The core idea of in-flight request deduplication is sound and addresses a real stampede problem. Here are my findings:

Critical Issue 1: Lease not released on getter error. If the getter returns an Err, try_join short-circuits via ?, so the lease release loop never executes. This permanently stalls subsequent requests for those keys (5-second wait per attempt) until the process restarts. A guard/defer pattern would ensure cleanup regardless of the error path.

Critical Issue 2: Broadcast not sent on empty resolution or getter error. broadcast_tx.send is only called when entries_values is non-empty. If the getter resolves no values or errors, waiters are stuck until IN_FLIGHT_TIMEOUT (5 seconds). The broadcast should be sent unconditionally after the getter completes. Combined with issue 1, a getter error causes both a leaked lease and a 5-second stall for all waiters.

Moderate Issue 3: HashMap iteration order creates implicit coupling. In req_config.rs, keys and cache_keys are unzipped from ctx.entries() with non-deterministic HashMap iteration order, then keys is zipped with cached_values from the driver. This works because both were derived from the same iterator in the same pass, but it is fragile. A Vec pairing (Key, RawCacheKey) would make the relationship explicit and safe. The same issue applies in the waiting-keys path (succeeded_keys / succeeded_cache_keys).

Moderate Issue 4: Rate limit tests silently removed. integration.rs contained test_rate_limit_basic and test_rate_limit_ip_isolation. These do not appear in any of the new test files (fetch.rs, in_flight.rs, ttl.rs). If rate limiting is still a feature of this crate, these tests should be preserved.

Minor Issue 5: timeout_falls_back_to_getter test adds 5 seconds to the test suite. The test necessarily waits for IN_FLIGHT_TIMEOUT. Making the timeout configurable via a cfg(test) override or a parameter on CacheInner would allow faster test runs.

Minor Issue 6: Inline await changes latency behavior on cache misses. The cache write was previously done in a background task; now it is awaited inline. This is likely intentional (to ensure broadcast happens after write), but it adds write latency to every cache miss response. A short comment documenting the trade-off would help future readers.

Positive Observations: Switching GetterCtx from Vec to HashMap removes O(n) deduplication and makes key lookups O(1). scc::HashMap is an appropriate choice for concurrent in-flight tracking. Test coverage for the deduplication logic is solid: single waiter, multiple waiters, independent keys, mixed cached/in-flight, and timeout fallback are all covered. Removing the anyhow glob import aligns with the project style guidelines. The Driver::get signature change to a slice of RawCacheKey is a correct ergonomics improvement.

@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from bc5d6d3 to b395fae Compare March 21, 2026 01:55
@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch from 17d21f5 to de18421 Compare March 21, 2026 01:55
@MasterPtato MasterPtato mentioned this pull request Mar 21, 2026
11 tasks
@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch from de18421 to e65f84d Compare March 24, 2026 00:30
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from b395fae to 6570bf2 Compare March 24, 2026 00:30
@MasterPtato MasterPtato mentioned this pull request Mar 24, 2026
11 tasks
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from 6570bf2 to b46226c Compare March 24, 2026 00:36
@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch 2 times, most recently from 65200cf to ea32d90 Compare March 25, 2026 00:05
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from b46226c to 66ec30f Compare March 25, 2026 00:05
@claude
Copy link
Copy Markdown

claude bot commented Mar 25, 2026

Code Review: feat(cache): add in flight deduping

This PR adds in-flight request deduplication to the cache layer using scc::HashMap and tokio::broadcast channels. The core idea is solid: concurrent requests for the same cache key share a lease, with waiters reading from cache after the leaseholder's getter completes rather than calling their own getter. The refactoring from index-based to key-based GetterCtx is a clean improvement. Good test coverage for the happy paths.

A few issues worth addressing:


Bug: Lease leak on getter error

In req_config.rs, if tokio::try_join! fails (e.g. the getter returns Err), the early return via ? skips the cleanup block at the bottom:

// Never reached if try_join! fails
for key in leased_keys {
    let cache_key = self.cache.driver.process_key(&base_key, &key);
    self.cache.in_flight.remove_async(&cache_key).await;
}

After a getter failure those keys remain in in_flight indefinitely. Future requests for those keys find the entry Occupied, subscribe to the stale sender, and block for the full IN_FLIGHT_TIMEOUT before falling back to their own getter. The sender clone in the map is live so subscribers do not get an early RecvError::Closed.

Fix: restructure so cleanup runs unconditionally — store the try_join! result, run cleanup, then propagate the error.


Bug: Waiters time out unnecessarily when getter resolves no values

The broadcast notification is only sent inside if !entries_values.is_empty(). If the getter succeeds but resolves no values (e.g. the data does not exist in the backing store), entries_values is empty, the broadcast is never sent, but the leases are cleaned up. Any waiter subscribed to those keys sits for the full 5 seconds before timing out and calling its own getter — which also returns nothing.

The signal should be sent unconditionally before lease cleanup, independent of whether there was data to write:

if !entries_values.is_empty() {
    cache.driver.set(...).await;
}
let _ = broadcast_tx.send(());  // always signal waiters
// then clean up leases

Slow test: timeout_falls_back_to_getter waits 5 seconds

This test blocks for the full IN_FLIGHT_TIMEOUT (a hardcoded 5-second constant in production code) on every CI run. Consider either making the timeout injectable per-RequestConfig so tests can pass a shorter value, or using tokio::time::pause() / tokio::time::advance() to skip the real wait.


Minor: GetterCtx::merge silently overwrites on key collision

HashMap::extend overwrites duplicate keys with values from other. Since ctx2 and ctx3 are partitioned from disjoint sets (leased vs. waiting), overlap cannot occur in practice. A debug_assert! would make the invariant explicit and catch future regressions.


Style: use anyhow::Result change is correct

Good catch removing use anyhow::*; — this aligns with the CLAUDE.md guideline.


Note: Cache writes are now synchronous

The cache write was previously fire-and-forget (spawned background task). It is now awaited inline before the broadcast signal. This is intentional and correct — waiters must see the written value when they re-read the cache — but the leaseholder's caller now blocks on the write. For the in-memory driver this is negligible; worth keeping in mind if a Redis-backed driver is added later.

@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from 66ec30f to 97b9cfd Compare March 26, 2026 01:18
@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch 2 times, most recently from ddfa969 to bed6ca4 Compare March 26, 2026 20:50
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from 97b9cfd to 3fc4f7f Compare March 26, 2026 20:50
@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch from bed6ca4 to 10a4ff1 Compare March 28, 2026 00:20
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from 3fc4f7f to 662fee6 Compare March 28, 2026 00:20
This was referenced Mar 28, 2026
@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch from 10a4ff1 to 860e71e Compare March 30, 2026 19:40
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch 2 times, most recently from 893ea25 to 715bec8 Compare March 31, 2026 01:40
@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch from 860e71e to 22498e8 Compare March 31, 2026 01:40
@MasterPtato MasterPtato mentioned this pull request Mar 31, 2026
11 tasks
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from 715bec8 to 31da82c Compare March 31, 2026 22:24
@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch from 22498e8 to 0b1f7e6 Compare March 31, 2026 22:24
@MasterPtato MasterPtato mentioned this pull request Mar 31, 2026
11 tasks
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from 31da82c to 85f7553 Compare April 1, 2026 02:11
@MasterPtato MasterPtato force-pushed the 03-18-fix_cache_clean_up_lib branch 2 times, most recently from 7e46b06 to 4080c12 Compare April 2, 2026 02:47
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from 85f7553 to 626728f Compare April 2, 2026 02:47
@MasterPtato MasterPtato force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from 626728f to 83ebe90 Compare April 3, 2026 01:24
@NathanFlurry NathanFlurry mentioned this pull request Apr 4, 2026
11 tasks
Copy link
Copy Markdown
Member

NathanFlurry commented Apr 5, 2026

Merge activity

  • Apr 5, 11:11 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Apr 5, 11:21 AM UTC: Graphite rebased this pull request as part of a merge.
  • Apr 5, 11:21 AM UTC: @NathanFlurry merged this pull request with Graphite.

@NathanFlurry NathanFlurry changed the base branch from 03-18-fix_cache_clean_up_lib to graphite-base/4459 April 5, 2026 11:18
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4459 to main April 5, 2026 11:19
@NathanFlurry NathanFlurry force-pushed the 03-19-feat_cache_add_in_flight_deduping branch from 83ebe90 to e2840f4 Compare April 5, 2026 11:20
@NathanFlurry NathanFlurry merged commit dd3afae into main Apr 5, 2026
10 of 12 checks passed
@NathanFlurry NathanFlurry deleted the 03-19-feat_cache_add_in_flight_deduping branch April 5, 2026 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants