Skip to content

refactor: replace lock unwrap panics with poison recovery helpers#541

Draft
PastaPastaPasta wants to merge 1 commit intov1.0-devfrom
pr/lock-poisoning-recovery
Draft

refactor: replace lock unwrap panics with poison recovery helpers#541
PastaPastaPasta wants to merge 1 commit intov1.0-devfrom
pr/lock-poisoning-recovery

Conversation

@PastaPastaPasta
Copy link
Member

@PastaPastaPasta PastaPastaPasta commented Feb 10, 2026

Summary

  • Introduces src/lock_helper.rs with MutexExt and RwLockExt traits that provide lock_or_recover(), read_or_recover(), and write_or_recover() methods
  • Replaces .lock().unwrap(), .read().unwrap(), and .write().unwrap() calls across 67 files with the poison-recovering alternatives
  • When a lock is poisoned (because another thread panicked while holding it), these helpers log a tracing::warn! and return the inner data instead of crashing the application

Motivation

Lock poisoning panics are one of the most common crash vectors in multi-threaded Rust applications. In a GUI app like Dash Evo Tool, a panic on any background thread (e.g., network timeout, malformed data) poisons every shared lock it held. Without recovery, the next thread to access that lock crashes too, cascading into a full application crash.

The poisoned data is almost always still valid — the panicking thread rarely corrupts the protected state. By recovering with a warning log instead of panicking, the app stays usable and the user doesn't lose their session.

Changes

  • New file: src/lock_helper.rsMutexExt<T> and RwLockExt<T> extension traits
  • 67 modified files across backend_task/, context.rs, database/, ui/, spv/, app.rs — mechanical replacement of .lock().unwrap().lock_or_recover(), .read().unwrap().read_or_recover(), .write().unwrap().write_or_recover()

Test plan

  • cargo build passes
  • cargo clippy --all-features --all-targets -- -D warnings passes clean
  • Existing tests pass (cargo test --all-features --workspace)
  • Manual smoke test: app starts, wallet operations work normally

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced application resilience by replacing panic-prone lock handling with graceful recovery mechanisms throughout the codebase, preventing crashes in error conditions and improving stability during concurrent operations.

Add src/lock_helper.rs with MutexExt and RwLockExt traits providing
lock_or_recover(), read_or_recover(), and write_or_recover() methods
that recover from poisoned locks instead of panicking. Migrates all
~80 production lock access sites. Test code retains unwrap().

Task: 2.5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 10, 2026

📝 Walkthrough

Walkthrough

The PR introduces resilient lock acquisition patterns across the entire codebase by adding extension traits that gracefully recover from poisoned mutex and RwLock states, replacing unwrap-based lock patterns with recoverable variants throughout backend tasks, database operations, and UI components.

Changes

Cohort / File(s) Summary
Lock Recovery Helper Module
src/lib.rs, src/lock_helper.rs
New public module introducing MutexExt and RwLockExt traits that provide lock_or_recover(), read_or_recover(), and write_or_recover() methods, enabling graceful recovery from poisoned locks by logging warnings and extracting inner values instead of panicking.
Backend Task Core Operations
src/app.rs, src/backend_task/core/*, src/backend_task/mnlist.rs, src/backend_task/mod.rs, src/backend_task/platform_info.rs
Replaced lock acquisition patterns across core RPC operations, asset lock creation, wallet refresh, and blockchain state synchronization with recovery-enabled variants for core_client, core_zmq_endpoint, and sdk locks.
Backend Task Identity Operations
src/backend_task/identity/*
Applied recovery-enabled lock acquisition across all identity-related workflows including loading, registration, top-up, transfer, and withdrawal operations for wallets, sdk, and transaction tracking structures.
Backend Task Wallet Operations
src/backend_task/wallet/*
Updated wallet funding, platform address handling, and credential management to use read_or_recover() and lock_or_recover() for wallet access and finality proof retrieval.
Database Connection Management
src/database/*
Systematically replaced .lock().unwrap() with .lock_or_recover() across all 14 database modules (identities, wallet, contracts, tokens, dashpay, etc.) to recover from poisoned connection locks instead of panicking.
Model and Context Locking
src/context.rs, src/model/wallet/mod.rs, src/model/qualified_identity/*
Updated application context and wallet model lock acquisitions to use recovery-enabled patterns for wallets, sdk, and core_client access.
UI Components — Wallet Management
src/ui/components/wallet_unlock*, src/ui/wallets/*
Applied recovery-enabled lock patterns across wallet unlock flows, asset lock details, and import operations to safely handle poisoned wallet and state locks.
UI Components — Identity Management
src/ui/identities/*
Replaced unwrap-based RwLock reads/writes with recovery-enabled variants across all identity screens and workflows (adding, funding, top-up, key info) for wallet, step state, and funding method access.
UI Components — Other Features
src/ui/dpns/*, src/ui/tokens/*, src/ui/tools/*
Updated DPNS name management, token search, and cryptographic tool screens to use lock_or_recover() and recovery-enabled read operations for contested names, search results, and wallet access.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 With poisoned locks we need not fear,
A helper trait brings cheer so clear,
No unwrap panics in our way,
Recovery paths save the day,
Through fifty files our rabbit hops,
Safe lock handling never stops! 🔐

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor: replace lock unwrap panics with poison recovery helpers' clearly and specifically describes the primary change in this PR.
Docstring Coverage ✅ Passed Docstring coverage is 92.41% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr/lock-poisoning-recovery

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/backend_task/identity/register_identity.rs (1)

728-728: ⚠️ Potential issue | 🟡 Minor

Inconsistent lock pattern — read().map_err() not converted to read_or_recover().

Line 728 uses wallet.read().map_err(|e| e.to_string())?.clone() while the rest of this file (and the PR) uses read_or_recover(). On a poisoned lock, this path returns an error to the caller whereas all other wallet reads recover silently. This creates inconsistent behavior for the same failure mode.

Suggested fix
-        let wallet_clone = { wallet.read().map_err(|e| e.to_string())?.clone() };
+        let wallet_clone = { wallet.read_or_recover().clone() };
src/ui/identities/top_up_identity_screen/by_wallet_qr_code.rs (1)

26-44: ⚠️ Potential issue | 🟡 Minor

core_client.read() not converted to read_or_recover() — inconsistent with the rest of the PR.

Lines 29–30 still use .read().map_err(...) while the analogous code in add_new_identity_screen/by_wallet_qr_code.rs (Lines 35, 48, 55) uses read_or_recover(). On a poisoned lock this path surfaces an error to the user instead of recovering.

Suggested diff
                     // Import address to Core if needed for monitoring
-                    let core_client = self
-                        .app_context
-                        .core_client
-                        .read()
-                        .map_err(|_| "Core client lock was poisoned".to_string())?;
+                    let core_client = self.app_context.core_client.read_or_recover();
🤖 Fix all issues with AI agents
In `@src/context.rs`:
- Around line 1477-1479: In received_transaction_finality replace the silent
poison-recover calls (wallets.read_or_recover() and
wallet_arc.write_or_recover()) with fallible lock acquisition that returns an
error to the caller (e.g. use wallets.read().map_err(|e| /* wrap to your error
type */)? and wallet_arc.write().map_err(|e| /* wrap */)?), so a poisoned mutex
surfaces as a Result error instead of silently continuing; apply the same change
pattern in received_asset_lock_finality for the analogous calls to ensure
poisoned locks are propagated rather than recovered.

In `@src/database/top_ups.rs`:
- Line 23: The code calls self.conn.lock_or_recover() which can resume a
poisoned rusqlite::Connection mutex and leave the DB in an inconsistent state;
change this to either use a fail-fast lock (self.conn.lock().unwrap()) or, if
you must keep lock_or_recover(), immediately reset the connection state before
running queries: after obtaining the guard from lock_or_recover() run a ROLLBACK
(or otherwise ensure no open transaction) and check its result, and if rollback
fails recreate or reopen the rusqlite::Connection instead of proceeding; update
the use sites in top_ups.rs that reference conn.lock_or_recover() to follow this
pattern and handle errors accordingly.
🧹 Nitpick comments (9)
src/backend_task/wallet/fetch_platform_address_balances.rs (1)

69-69: Inconsistent poison-handling strategies within the same function.

Line 30 silently recovers from a poisoned self.wallets lock via read_or_recover(), but wallet_arc.read() (lines 69, 197, 369) and wallet_arc.write() (line 262) still propagate poison as Err. Similarly self.sdk.read() at line 81.

This means a panic in another thread holding wallet_arc will cause this function to return an error, while a panic holding self.wallets will be silently swallowed. If the intent is uniform resilience, consider applying read_or_recover() / write_or_recover() here too (or vice-versa, keeping .map_err everywhere for explicit error propagation).

Not blocking — just noting the divergence for the author's consideration.

Also applies to: 81-81, 197-197, 262-262, 369-372

src/lock_helper.rs (1)

1-69: Consider adding inline #[test] coverage for the recovery paths.

A small test that poisons a lock (via a panicking thread) and then calls lock_or_recover() / read_or_recover() / write_or_recover() would validate the recovery behavior and satisfy the project's inline-test convention.

As per coding guidelines, **/*.rs: "Write unit tests inline in source files using #[test] attribute".

Example test skeleton
#[cfg(test)]
mod tests {
    use super::*;
    use std::sync::{Arc, Mutex, RwLock};

    #[test]
    fn mutex_recovers_from_poison() {
        let m = Arc::new(Mutex::new(42));
        let m2 = m.clone();
        let _ = std::thread::spawn(move || {
            let _g = m2.lock().unwrap();
            panic!("intentional");
        })
        .join();
        assert!(m.lock().is_err(), "lock should be poisoned");
        let guard = m.lock_or_recover();
        assert_eq!(*guard, 42);
    }

    #[test]
    fn rwlock_read_recovers_from_poison() {
        let rw = Arc::new(RwLock::new(7));
        let rw2 = rw.clone();
        let _ = std::thread::spawn(move || {
            let _g = rw2.write().unwrap();
            panic!("intentional");
        })
        .join();
        assert!(rw.read().is_err());
        let guard = rw.read_or_recover();
        assert_eq!(*guard, 7);
    }

    #[test]
    fn rwlock_write_recovers_from_poison() {
        let rw = Arc::new(RwLock::new(0));
        let rw2 = rw.clone();
        let _ = std::thread::spawn(move || {
            let _g = rw2.write().unwrap();
            panic!("intentional");
        })
        .join();
        let mut guard = rw.write_or_recover();
        *guard = 99;
        assert_eq!(*guard, 99);
    }
}
src/database/asset_lock_transaction.rs (1)

26-26: Acknowledge the trade-off for database connection poison recovery.

Recovering from a poisoned mutex on the SQLite connection means a previous holder panicked mid-operation. SQLite auto-rolls-back uncommitted transactions on the connection, so this is generally safe. However, if any higher-level invariant was partially applied (e.g., two correlated writes where only one completed), the recovered connection will proceed against inconsistent application state.

For a GUI app where the alternative is a full crash, this trade-off is reasonable — just worth documenting in lock_helper.rs that callers should be aware of potential partial-update scenarios after recovery.

src/backend_task/core/refresh_single_key_wallet_info.rs (1)

18-21: Inconsistent lock-handling strategies within the same function.

Lines 19 and 57 use .read()/.write().map_err(|e| e.to_string())? (propagating poison as an Err), while lines 25 and 34 use read_or_recover() (silently recovering from poison with a warning log). Both approaches avoid panics, but the different recovery semantics in one function could surprise future maintainers.

This isn't a bug — the .map_err()? sites were already non-panicking and outside the PR's scope — but consider aligning them for consistency in a follow-up.

Also applies to: 25-25, 34-34

src/backend_task/core/create_asset_lock.rs (1)

38-39: Inconsistent error handling between lock sites in the same function.

Lines 23 and 50 use wallet.write().map_err(|e| e.to_string())? which propagates poison errors to the caller, while lines 38 and 44 use the silent-recovery helpers. This means a poisoned transactions_waiting_for_finality or core_client will silently proceed (potentially broadcasting a transaction with corrupted state), but a poisoned wallet will return an error.

This is likely fine given the PR's intent (only replacing .unwrap() calls), but worth noting: read_or_recover() on core_client (line 44) means a poisoned RPC client will attempt to broadcast a financial transaction without the caller knowing the client may be in a bad state.

Also applies to: 44-46

src/ui/identities/identities_screen.rs (1)

1169-1175: Avoid double‑locking identities in ui().
Take a single snapshot and reuse it for the refresh button and the main render to keep the view consistent and reduce lock churn.

♻️ Suggested refactor
-        right_buttons.push((
-            "Load Identity",
-            DesiredAppAction::AddScreenType(Box::new(ScreenType::AddExistingIdentity)),
-        ));
-        if !self.identities.lock_or_recover().is_empty() {
-            // Create a vec of RefreshIdentity(identity) DesiredAppAction for each identity
-            let backend_tasks: Vec<BackendTask> = self
-                .identities
-                .lock_or_recover()
-                .values()
-                .map(|qi| BackendTask::IdentityTask(IdentityTask::RefreshIdentity(qi.clone())))
-                .collect();
-            right_buttons.push((
-                "Refresh",
-                DesiredAppAction::BackendTasks(
-                    backend_tasks,
-                    BackendTasksExecutionMode::Concurrent,
-                ),
-            ));
-        }
+        let identities_vec = {
+            let guard = self.identities.lock_or_recover();
+            guard.values().cloned().collect::<Vec<_>>()
+        };
+
+        right_buttons.push((
+            "Load Identity",
+            DesiredAppAction::AddScreenType(Box::new(ScreenType::AddExistingIdentity)),
+        ));
+        if !identities_vec.is_empty() {
+            let backend_tasks: Vec<BackendTask> = identities_vec
+                .iter()
+                .cloned()
+                .map(|qi| BackendTask::IdentityTask(IdentityTask::RefreshIdentity(qi)))
+                .collect();
+            right_buttons.push((
+                "Refresh",
+                DesiredAppAction::BackendTasks(
+                    backend_tasks,
+                    BackendTasksExecutionMode::Concurrent,
+                ),
+            ));
+        }
@@
-        let identities_vec = {
-            let guard = self.identities.lock_or_recover();
-            guard.values().cloned().collect::<Vec<_>>()
-        };
+        // identities_vec already computed above

Also applies to: 1196-1197

src/ui/identities/top_up_identity_screen/mod.rs (1)

99-111: Inconsistent locking pattern at Line 110.

Line 99 and 104 use read_or_recover(), but Line 110 still uses .read().ok()?. On a poisoned lock this silently falls back to "Select", which is benign, but inconsistent with the PR's stated goal of converting all lock sites. Consider aligning for consistency:

Suggested diff
                 let selected_wallet_alias = self
                     .wallet
                     .as_ref()
-                    .and_then(|wallet| wallet.read().ok()?.alias.clone())
+                    .and_then(|wallet| wallet.read_or_recover().alias.clone())
                     .unwrap_or_else(|| "Select".to_string());
src/backend_task/identity/register_identity.rs (2)

393-406: Two separate read_or_recover() calls to get seed_hash — consider caching.

wallet.read_or_recover().seed_hash() is called at Line 393 and again at Line 404 with no mutation in between. The seed hash could be fetched once.

Suggested diff
-        let wallet_seed_hash = { wallet.read_or_recover().seed_hash() };
+        let wallet_seed_hash = { 
+            let r = wallet.read_or_recover();
+            let hash = r.seed_hash();
+            hash
+        };
         let mut qualified_identity = QualifiedIdentity {
             identity: identity.clone(),
             ...
             private_keys: keys.to_key_storage(wallet_seed_hash),
             ...
             associated_wallets: BTreeMap::from([(
-                wallet.read_or_recover().seed_hash(),
+                wallet_seed_hash,
                 wallet.clone(),
             )]),

850-876: get_platform_address_best_info not converted to recoverable helpers.

Lines 852–856 use .read().inspect_err(...).ok()? and Line 860 uses .read().ok()?. While the fallback to None is safe, this is inconsistent with the PR's goal of standardizing on *_or_recover(). These sites will still fail silently on poisoned locks rather than recovering with a warning.

Suggested diff
         let wallets = self
             .wallets
-            .read()
-            .inspect_err(|e| tracing::error!(err=%e, "wallet lock poisoned"))
-            .ok()?;
+            .read_or_recover();
 
         let mut recent_info: Option<AddressInfo> = None;
         for wallet in wallets.values() {
-            let wallet_guard = wallet.read().ok()?;
+            let wallet_guard = wallet.read_or_recover();

Comment on lines +1477 to +1479
let wallets = self.wallets.read_or_recover();
for wallet_arc in wallets.values() {
let mut wallet = wallet_arc.write().unwrap();
let mut wallet = wallet_arc.write_or_recover();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Recovering a write lock on wallet data in a financial operation deserves caution.

In received_transaction_finality, both wallets.read_or_recover() (Line 1477) and wallet_arc.write_or_recover() (Line 1479) silently recover from poisoned locks. If a previous thread panicked mid-mutation of wallet data (UTXOs, balances), the recovered MutexGuard may expose partially-updated financial state. This method then proceeds to insert UTXOs and update balances, potentially compounding the inconsistency.

Consider using .write().map_err(|e| ...)? here to surface the error to the caller, since this function already returns Result. Same applies to received_asset_lock_finality at Lines 1603-1605.

🤖 Prompt for AI Agents
In `@src/context.rs` around lines 1477 - 1479, In received_transaction_finality
replace the silent poison-recover calls (wallets.read_or_recover() and
wallet_arc.write_or_recover()) with fallible lock acquisition that returns an
error to the caller (e.g. use wallets.read().map_err(|e| /* wrap to your error
type */)? and wallet_arc.write().map_err(|e| /* wrap */)?), so a poisoned mutex
surfaces as a Result error instead of silently continuing; apply the same change
pattern in received_asset_lock_finality for the analogous calls to ensure
poisoned locks are propagated rather than recovered.

#[allow(dead_code)] // May be used for generating sequential top-up indices
pub fn get_next_top_up_index(&self, identity_id: &[u8]) -> rusqlite::Result<u64> {
let conn = self.conn.lock().unwrap();
let conn = self.conn.lock_or_recover();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider the implications of recovering a poisoned database connection mutex.

If the mutex was poisoned because a thread panicked mid-query or mid-transaction on the same rusqlite::Connection, the connection could be in an inconsistent state (e.g., an open transaction that was never committed or rolled back). Recovering the inner guard and issuing new queries on it may produce unexpected results or violate data integrity.

For non-DB locks (UI state, SDK handles), recovering is generally safe. For database connections, consider whether it's safer to keep the .lock().unwrap() (fail-fast) or at least reset the connection state (e.g., ROLLBACK on recovery) before proceeding.

🤖 Prompt for AI Agents
In `@src/database/top_ups.rs` at line 23, The code calls
self.conn.lock_or_recover() which can resume a poisoned rusqlite::Connection
mutex and leave the DB in an inconsistent state; change this to either use a
fail-fast lock (self.conn.lock().unwrap()) or, if you must keep
lock_or_recover(), immediately reset the connection state before running
queries: after obtaining the guard from lock_or_recover() run a ROLLBACK (or
otherwise ensure no open transaction) and check its result, and if rollback
fails recreate or reopen the rusqlite::Connection instead of proceeding; update
the use sites in top_ups.rs that reference conn.lock_or_recover() to follow this
pattern and handle errors accordingly.

@PastaPastaPasta
Copy link
Member Author

We may actually not want this :)

@PastaPastaPasta PastaPastaPasta marked this pull request as draft February 10, 2026 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant