feat: Add workspace locking middleware using PostgreSQL advisory locks#817
feat: Add workspace locking middleware using PostgreSQL advisory locks#817
Conversation
Changed Files
|
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughA new workspace locking middleware using PostgreSQL advisory locks is introduced to serialize write operations on a per-workspace basis. The middleware is exported as a public module and integrated across multiple API route scopes in the main application to enforce workspace-level concurrency control. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Middleware as WorkspaceLock<br/>Middleware
participant DBPool as DB Pool
participant PG as PostgreSQL
participant Service as Inner Service
Client->>Middleware: Request (POST/PUT/DELETE/PATCH)
activate Middleware
Note over Middleware: Extract org_id &<br/>workspace_id
Middleware->>Middleware: compute_lock_keys()<br/>(org_key, workspace_key)
Middleware->>DBPool: Get PgConnection
activate DBPool
DBPool-->>Middleware: Connection
deactivate DBPool
rect rgb(200, 220, 255)
Note over Middleware,PG: Retry loop (up to 10 attempts)
Middleware->>PG: pg_try_advisory_lock<br/>(org_key, workspace_key)
alt Lock acquired
PG-->>Middleware: Success
Middleware->>Service: Forward request
activate Service
Service-->>Middleware: Response
deactivate Service
Middleware->>PG: pg_advisory_unlock<br/>(org_key, workspace_key)
PG-->>Middleware: Released
else Lock unavailable
Note over Middleware: Exponential backoff retry
PG-->>Middleware: Failed
end
end
Middleware-->>Client: ServiceResponse
deactivate Middleware
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
crates/superposition/src/main.rs (2)
173-175: Same middleware ordering issue applies here.All these route scopes have the same incorrect middleware order. After fixing
/context, apply the same fix consistently across/dimension,/default-config,/config,/audit,/function, and/types.Also applies to: 180-182, 187-189, 194-196, 201-203, 208-210
215-217: Same middleware ordering issue applies here.Apply the same middleware order fix to
/experiments,/experiment-groups,/webhook,/variables,/resolve, and/auth.Also applies to: 221-223, 237-239, 244-246, 251-253, 258-260
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
crates/service_utils/src/middlewares.rscrates/service_utils/src/middlewares/workspace_lock.rscrates/superposition/src/main.rs
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2026-01-03T13:27:14.072Z
Learnt from: ayushjain17
Repo: juspay/superposition PR: 816
File: crates/frontend/src/pages/type_template.rs:82-87
Timestamp: 2026-01-03T13:27:14.072Z
Learning: In the frontend crate, both `Workspace` and `OrganisationId` types implement `Deref` trait (via `#[derive(Deref)]`), allowing automatic coercion from `&Workspace` to `&str` and `&OrganisationId` to `&str`. When passing these types to functions expecting `&str`, use `&workspace` or `&org` directly instead of `&workspace.0` or `&org.0`.
Applied to files:
crates/superposition/src/main.rs
📚 Learning: 2026-01-03T13:25:40.584Z
Learnt from: ayushjain17
Repo: juspay/superposition PR: 816
File: crates/frontend/src/pages/webhook.rs:136-137
Timestamp: 2026-01-03T13:25:40.584Z
Learning: In the superposition codebase (Rust frontend), the `Workspace` and `OrganisationId` newtype wrappers implement `Deref`, which allows `&Workspace` and `&OrganisationId` to be automatically coerced to `&str` when passed to functions expecting `&str` parameters. Manual `.0` dereferencing is not needed.
Applied to files:
crates/superposition/src/main.rs
📚 Learning: 2026-01-02T20:59:01.233Z
Learnt from: ayushjain17
Repo: juspay/superposition PR: 543
File: crates/service_utils/src/middlewares/auth_z.rs:141-152
Timestamp: 2026-01-02T20:59:01.233Z
Learning: In crates/service_utils/src/middlewares/auth_z.rs, the AuthZHandler::init function is intentionally designed to panic on startup if AUTH_Z_PROVIDER environment variable is missing or set to an unknown value. This fail-fast behavior is expected and preferred for this critical authorization configuration.
Applied to files:
crates/superposition/src/main.rs
🧬 Code graph analysis (1)
crates/superposition/src/main.rs (2)
crates/service_utils/src/middlewares/workspace_lock.rs (1)
new(26-28)crates/service_utils/src/middlewares/workspace_context.rs (1)
new(28-33)
🔇 Additional comments (8)
crates/service_utils/src/middlewares.rs (1)
5-5: LGTM!The new module export follows the existing pattern in this file.
crates/superposition/src/main.rs (1)
33-33: LGTM!Import follows the existing import pattern for middleware factories.
crates/service_utils/src/middlewares/workspace_lock.rs (6)
1-54: LGTM!The factory and
Transformimplementation follow standard actix-web middleware patterns correctly.
72-86: LGTM!Write operation detection and early return for reads is well structured.
161-173: Hash stability note:DefaultHasheris not guaranteed stable across Rust versions.For advisory locks within a single running process, this is fine. However, be aware that
DefaultHashercan change between Rust versions. If you ever need cross-process or persistent lock key consistency (e.g., multiple service instances coordinating), consider a stable hash likexxhashorfnv.
230-235: LGTM!The helper struct correctly deserializes the
pg_try_advisory_lockresult.
237-248: LGTM!The release function is straightforward. Ignoring the return value of
pg_advisory_unlockis acceptable since we're just ensuring cleanup.
250-292: LGTM!Good unit test coverage for
compute_lock_keys. The uniqueness assertions have a theoretical (but negligible) chance of hash collision. Consider adding integration tests for the actual advisory lock acquire/release behavior if a test database is available.
| .wrap(OrgWorkspaceMiddlewareFactory::new(true, true)) | ||
| .wrap(WorkspaceLockMiddlewareFactory::new()) | ||
| .service(context::endpoints()), |
There was a problem hiding this comment.
Critical: Middleware order causes locking to be bypassed.
In actix-web, the last .wrap() runs first on incoming requests. Currently WorkspaceLockMiddlewareFactory is wrapped after OrgWorkspaceMiddlewareFactory, meaning the lock middleware runs before the org/workspace IDs are extracted and set in request extensions. The lock middleware finds no IDs and proceeds without acquiring a lock.
Swap the middleware order so OrgWorkspaceMiddlewareFactory runs first:
🔎 Proposed fix
.service(
scope("/context")
.app_data(Resource::Context)
- .wrap(OrgWorkspaceMiddlewareFactory::new(true, true))
- .wrap(WorkspaceLockMiddlewareFactory::new())
+ .wrap(WorkspaceLockMiddlewareFactory::new())
+ .wrap(OrgWorkspaceMiddlewareFactory::new(true, true))
.service(context::endpoints()),
)Apply the same fix to all other route scopes using both middlewares.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| .wrap(OrgWorkspaceMiddlewareFactory::new(true, true)) | |
| .wrap(WorkspaceLockMiddlewareFactory::new()) | |
| .service(context::endpoints()), | |
| .wrap(WorkspaceLockMiddlewareFactory::new()) | |
| .wrap(OrgWorkspaceMiddlewareFactory::new(true, true)) | |
| .service(context::endpoints()), |
🤖 Prompt for AI Agents
In crates/superposition/src/main.rs around lines 166 to 168, the middleware
order is wrong: WorkspaceLockMiddlewareFactory is wrapped after
OrgWorkspaceMiddlewareFactory so it runs before IDs are extracted, bypassing
locking; swap the wraps so OrgWorkspaceMiddlewareFactory is wrapped first and
WorkspaceLockMiddlewareFactory second (i.e., ensure
OrgWorkspaceMiddlewareFactory executes before the lock middleware on incoming
requests) and apply the same swap to all other route scopes that use both
middlewares so locks run after IDs have been set in request extensions.
0c41930 to
46764cd
Compare
7f03c92 to
d80a71f
Compare
d80a71f to
1de2e62
Compare
1de2e62 to
ae3ed17
Compare
| let is_write_operation = matches!( | ||
| req.method(), | ||
| &Method::POST | &Method::PUT | &Method::DELETE | &Method::PATCH | ||
| ); |
There was a problem hiding this comment.
Don't remember completely, but we have read operations which are POST.
@ayushjain17 @Datron
| "acquired advisory lock for workspace (org_key: {}, workspace_key: {})", | ||
| org_key, workspace_key | ||
| ); | ||
| Some(AdvisoryLockGuard::new(&mut db_conn, org_key, workspace_key)) |
There was a problem hiding this comment.
The guard can be returned by the acquire fn
| "lock contention detected, retrying in {}ms (attempt {}/{}, org_key: {}, workspace_key: {})", | ||
| backoff_ms, attempt + 1, MAX_RETRIES, org_key, workspace_key | ||
| ); | ||
| actix_web::rt::time::sleep(std::time::Duration::from_millis(backoff_ms)).await; |
There was a problem hiding this comment.
We can use Conditional Variables to awake the sleeping tasks. The awake call can go in drop of LockGuard.
e21bc07 to
ae2bb08
Compare
ae2bb08 to
099e5fa
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces a new Actix middleware intended to serialize workspace-scoped write operations using PostgreSQL advisory locks, and also includes broad provider API refactors/FFI binding updates plus automation and generated “AI skills” documentation artifacts.
Changes:
- Added
WorkspaceLockMiddlewareFactory(Postgres advisory-lock based) middleware module inservice_utils. - Refactored Rust + Python provider data source interfaces to make
fetch_configa default wrapper overfetch_filtered_config, and updated experimentation filtering to support a newpartial_applyflag across UniFFI bindings. - Added skills generation tooling (Make target + GitHub Action) and committed generated
.agents/skills/**content.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| superposition.skill-seekers.json | Adds skill-seekers configuration for generating Superposition “skills” from docs. |
| makefile | Adds skills-update target to generate/package skills locally. |
| crates/superposition_provider/src/local_provider.rs | Refactors to implement fetch_filtered_config directly and filter cached config in-place. |
| crates/superposition_provider/src/data_source/http.rs | Moves config fetching into fetch_filtered_config implementation for the data source. |
| crates/superposition_provider/src/data_source/file.rs | Refactors file data source to implement fetch_filtered_config directly and apply filters. |
| crates/superposition_provider/src/data_source.rs | Makes fetch_config a default method delegating to fetch_filtered_config. |
| crates/superposition_core/src/ffi.rs | Adds partial_apply parameter to experiment filtering selection logic. |
| crates/service_utils/src/middlewares/workspace_lock.rs | Introduces new workspace locking middleware using PG advisory locks. |
| crates/service_utils/src/middlewares.rs | Exposes the new workspace_lock middleware module. |
| clients/python/provider/superposition_provider/local_provider.py | Makes LocalResolutionProvider implement SuperpositionDataSource and adds async fetch methods. |
| clients/python/provider/superposition_provider/http_data_source.py | Refactors to implement fetch_filtered_config directly (removes helper wrapper methods). |
| clients/python/provider/superposition_provider/file_data_source.py | Refactors to implement fetch_filtered_config directly (removes helper wrapper methods). |
| clients/python/provider/superposition_provider/data_source.py | Makes fetch_config a concrete default delegating to fetch_filtered_config. |
| clients/python/bindings/superposition_bindings/superposition_client.py | Updates UniFFI Python bindings for new partial_apply arg + checksum changes. |
| clients/java/bindings/src/main/kotlin/uniffi/superposition_client/superposition_client.kt | Updates UniFFI Kotlin bindings for new partialApply arg + checksum changes. |
| .gitignore | Ignores scripts/skill_templates/ outputs generated by skills tooling. |
| .github/workflows/update-skills.yml | Adds workflow to generate/package/upload skills and commit .agents/skills/** updates. |
| .agents/skills/superposition/reference/documentation/overview/setup.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/overview/intro.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/lsp-support.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/k8s-staggered-releaser.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/intro.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/format-specification.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/experimentation.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/dimensions.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/deterministic-resolution.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/creating_client.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/context-expressions.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/config-file-compatibility.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/client_experimentation.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/client_context_aware_configuration.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/cascading-model.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/cac-toml.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/other/cac-redis-module.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/features/python.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/features/overview.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/features/javascript.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/features/java.md | Adds generated skill reference documentation. |
| .agents/skills/superposition/reference/documentation/extraction_summary.json | Adds generated extraction summary metadata. |
| .agents/skills/superposition/reference/dependencies/statistics.json | Adds generated dependency stats metadata. |
| .agents/skills/superposition/reference/dependencies/dependency_graph.mmd | Adds generated dependency graph metadata (Mermaid). |
| .agents/skills/superposition/reference/dependencies/dependency_graph.json | Adds generated dependency graph metadata (JSON). |
| .agents/skills/superposition/reference/dependencies/dependency_graph.dot | Adds generated dependency graph metadata (DOT). |
| .agents/skills/superposition/reference/config_patterns/config_patterns.md | Adds generated config-pattern extraction report. |
| .agents/skills/superposition/SKILL.md | Adds generated top-level skill manifest and index. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub mod auth_z; | ||
| pub mod request_response_logging; | ||
| pub mod workspace_context; | ||
| pub mod workspace_lock; |
There was a problem hiding this comment.
The PR description says the workspace lock middleware is registered on workspace-scoped endpoints, but there are currently no references/usages of WorkspaceLockMiddlewareFactory anywhere outside its own module (search across crates/**.rs). As-is, exporting the module won’t activate locking. The middleware needs to be wired into the Actix scopes (and ordered appropriately relative to OrgWorkspaceMiddlewareFactory).
| // Acquire advisory lock if we have a lock key and create guard | ||
| let _lock_guard = if let Some(lock_key) = lock_key { | ||
| match acquire_advisory_lock(&mut db_conn, lock_key).await { | ||
| Ok(guard) => { |
There was a problem hiding this comment.
_lock_guard holds a mutable reference to db_conn (via AdvisoryLockGuard<'a>), and the request is awaited afterwards (srv.call(req).await). Holding a &mut borrow across an .await will fail to compile in async Rust. Consider redesigning the guard to own the pooled connection (or otherwise avoid borrowing db_conn) so it can be kept alive for the request duration without a self-referential borrow.
| Err(diesel::result::Error::DatabaseError( | ||
| diesel::result::DatabaseErrorKind::Unknown, | ||
| Box::new(format!( | ||
| "Failed to acquire workspace lock after {} attempts (high contention)", | ||
| MAX_RETRIES | ||
| )), | ||
| )) |
There was a problem hiding this comment.
This constructs diesel::result::Error::DatabaseError using Box::new(format!(...)), but Diesel expects a Box<dyn DatabaseErrorInformation> rather than a String. This won’t compile. Suggest returning a middleware-specific error type (or actix_web::Error) from acquire_advisory_lock and mapping it to an HTTP error response instead of trying to synthesize a Diesel DatabaseError.
099e5fa to
add6f5d
Compare
This commit introduces a new middleware that serializes all write operations
(POST, PUT, DELETE, PATCH) per workspace using PostgreSQL advisory locks.
Changes:
- Created WorkspaceLockMiddleware that:
- Extracts org_id and workspace_id from requests
- Computes a unique lock key using hash of org_id:workspace_id
- Acquires PostgreSQL advisory lock before processing write operations
- Ensures lock is released after request completion
- Skips locking for read operations (GET, etc.)
- Registered the middleware on all workspace-scoped endpoints:
/context, /dimension, /default-config, /config, /audit, /function,
/types, /experiments, /experiment-groups, /webhook, /variables,
/resolve, /auth
This ensures write operations to the same workspace are serialized,
preventing race conditions and maintaining data consistency.
refactor: Use two-argument pg_advisory_lock for better lock space utilization
Changed from single-argument pg_advisory_lock(bigint) to two-argument
pg_advisory_lock(int, int) form for workspace locking.
Benefits:
- More natural mapping: org_id and workspace_id get separate hash spaces
- Better lock space utilization: each ID gets full 32-bit space
- Lower collision probability: separate hashing reduces conflicts
- Easier debugging: both components visible in pg_locks table
Implementation:
- compute_lock_keys() now returns (i32, i32) tuple
- org_id and workspace_id are hashed independently
- Updated acquire/release functions to use two-argument SQL
- Enhanced tests to verify component separation
feat: Add retry logic with exponential backoff for workspace locks
Changed from blocking pg_advisory_lock() to non-blocking pg_try_advisory_lock()
with intelligent retry logic to prevent indefinite request blocking.
**Previous Behavior:**
- pg_advisory_lock() blocks indefinitely until lock is available
- Requests could hang for extended periods during high contention
- No visibility into lock acquisition delays
- Risk of cascading timeouts
**New Behavior:**
- pg_try_advisory_lock() returns immediately with success/failure
- Exponential backoff retry: 10ms, 20ms, 40ms, 80ms... up to 500ms max
- Maximum 10 attempts (total ~5 seconds max wait)
- Clear error message after exhausting retries
- Logs retry attempts for observability
**Retry Configuration:**
- MAX_RETRIES: 10 attempts
- INITIAL_BACKOFF_MS: 10ms
- MAX_BACKOFF_MS: 500ms (cap to prevent excessive delays)
**Benefits:**
- Predictable maximum wait time (~5 seconds)
- Better user experience with faster failures
- Reduced risk of cascading timeouts
- Visibility into lock contention via logs
- Graceful degradation under high load
fix: Replace blocking sleep with async sleep in lock retry logic
Fixed critical async/blocking issues flagged by code review:
**Issue 1: Blocking sleep in async context**
- Changed std::thread::sleep() to actix_web::rt::time::sleep().await
- Using blocking sleep in async middleware would block the entire worker thread
- This prevented other requests from being processed on that thread
- Now properly yields control back to the async executor during backoff
**Issue 2: Made acquire_advisory_lock async**
- Function signature changed from sync to async
- Properly propagates async behavior through the call chain
- Maintains non-blocking execution throughout retry attempts
**Impact:**
- Before: Worker threads would be blocked during lock retry delays
- After: Worker threads can process other requests while waiting
- Much better concurrency and throughput under lock contention
fix: Add RAII guard to ensure lock release even on panic
Implemented AdvisoryLockGuard using RAII pattern to guarantee lock
release in all code paths, including when handlers panic.
**Problem:**
Previous implementation would skip lock release if the handler panicked:
```rust
acquire_lock()
handler() // <-- If this panics...
release_lock() // <-- ...this never runs!
```
This would leave locks held until DB connection closes, potentially
causing deadlocks or severe contention.
**Solution:**
Created AdvisoryLockGuard struct that implements Drop:
```rust
struct AdvisoryLockGuard<'a> {
conn: &'a mut PgConnection,
org_key: i32,
workspace_key: i32,
}
impl Drop for AdvisoryLockGuard<'_> {
fn drop(&mut self) {
// Always releases lock, even on panic
release_advisory_lock(...)
}
}
```
**How it works:**
1. Acquire lock
2. Create guard (holds mutable reference to connection)
3. Call handler
4. Guard is automatically dropped when scope ends
- On normal return: guard drops, lock released
- On panic: guard drops during unwinding, lock released
- On early return: guard drops, lock released
**Benefits:**
- Guaranteed lock cleanup in all code paths
- Panic-safe resource management
- Prevents lock leaks that could cause deadlocks
- Follows Rust RAII best practices
fix: moved intialization of lock guard to acquire call
add6f5d to
a95f125
Compare
Problem
Currently, changes within workspace are not serialized leading to potential race conditions in version_state generation. We need to serialize them.
Solution
This commit introduces a new middleware that serializes all write operations
(POST, PUT, DELETE, PATCH) per workspace using PostgreSQL advisory locks.
Changes:
Created WorkspaceLockMiddleware that:
Registered the middleware on all versioning related workspace-scoped endpoints:
/context, /dimension, /default-config, /config,
This ensures write operations to the same workspace are serialized,
preventing race conditions and maintaining data consistency.
Environment variable changes
NA
Pre-deployment activity
NA
Post-deployment activity
NA
API changes
All workspace write endpoints
Possible Issues in the future
Higher latency for write APIs
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.