refactor: replace lock unwrap panics with poison recovery helpers#541
refactor: replace lock unwrap panics with poison recovery helpers#541PastaPastaPasta wants to merge 1 commit intov1.0-devfrom
Conversation
Add src/lock_helper.rs with MutexExt and RwLockExt traits providing lock_or_recover(), read_or_recover(), and write_or_recover() methods that recover from poisoned locks instead of panicking. Migrates all ~80 production lock access sites. Test code retains unwrap(). Task: 2.5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe PR introduces resilient lock acquisition patterns across the entire codebase by adding extension traits that gracefully recover from poisoned mutex and RwLock states, replacing unwrap-based lock patterns with recoverable variants throughout backend tasks, database operations, and UI components. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/backend_task/identity/register_identity.rs (1)
728-728:⚠️ Potential issue | 🟡 MinorInconsistent lock pattern —
read().map_err()not converted toread_or_recover().Line 728 uses
wallet.read().map_err(|e| e.to_string())?.clone()while the rest of this file (and the PR) usesread_or_recover(). On a poisoned lock, this path returns an error to the caller whereas all other wallet reads recover silently. This creates inconsistent behavior for the same failure mode.Suggested fix
- let wallet_clone = { wallet.read().map_err(|e| e.to_string())?.clone() }; + let wallet_clone = { wallet.read_or_recover().clone() };src/ui/identities/top_up_identity_screen/by_wallet_qr_code.rs (1)
26-44:⚠️ Potential issue | 🟡 Minor
core_client.read()not converted toread_or_recover()— inconsistent with the rest of the PR.Lines 29–30 still use
.read().map_err(...)while the analogous code inadd_new_identity_screen/by_wallet_qr_code.rs(Lines 35, 48, 55) usesread_or_recover(). On a poisoned lock this path surfaces an error to the user instead of recovering.Suggested diff
// Import address to Core if needed for monitoring - let core_client = self - .app_context - .core_client - .read() - .map_err(|_| "Core client lock was poisoned".to_string())?; + let core_client = self.app_context.core_client.read_or_recover();
🤖 Fix all issues with AI agents
In `@src/context.rs`:
- Around line 1477-1479: In received_transaction_finality replace the silent
poison-recover calls (wallets.read_or_recover() and
wallet_arc.write_or_recover()) with fallible lock acquisition that returns an
error to the caller (e.g. use wallets.read().map_err(|e| /* wrap to your error
type */)? and wallet_arc.write().map_err(|e| /* wrap */)?), so a poisoned mutex
surfaces as a Result error instead of silently continuing; apply the same change
pattern in received_asset_lock_finality for the analogous calls to ensure
poisoned locks are propagated rather than recovered.
In `@src/database/top_ups.rs`:
- Line 23: The code calls self.conn.lock_or_recover() which can resume a
poisoned rusqlite::Connection mutex and leave the DB in an inconsistent state;
change this to either use a fail-fast lock (self.conn.lock().unwrap()) or, if
you must keep lock_or_recover(), immediately reset the connection state before
running queries: after obtaining the guard from lock_or_recover() run a ROLLBACK
(or otherwise ensure no open transaction) and check its result, and if rollback
fails recreate or reopen the rusqlite::Connection instead of proceeding; update
the use sites in top_ups.rs that reference conn.lock_or_recover() to follow this
pattern and handle errors accordingly.
🧹 Nitpick comments (9)
src/backend_task/wallet/fetch_platform_address_balances.rs (1)
69-69: Inconsistent poison-handling strategies within the same function.Line 30 silently recovers from a poisoned
self.walletslock viaread_or_recover(), butwallet_arc.read()(lines 69, 197, 369) andwallet_arc.write()(line 262) still propagate poison asErr. Similarlyself.sdk.read()at line 81.This means a panic in another thread holding
wallet_arcwill cause this function to return an error, while a panic holdingself.walletswill be silently swallowed. If the intent is uniform resilience, consider applyingread_or_recover()/write_or_recover()here too (or vice-versa, keeping.map_erreverywhere for explicit error propagation).Not blocking — just noting the divergence for the author's consideration.
Also applies to: 81-81, 197-197, 262-262, 369-372
src/lock_helper.rs (1)
1-69: Consider adding inline#[test]coverage for the recovery paths.A small test that poisons a lock (via a panicking thread) and then calls
lock_or_recover()/read_or_recover()/write_or_recover()would validate the recovery behavior and satisfy the project's inline-test convention.As per coding guidelines,
**/*.rs: "Write unit tests inline in source files using#[test]attribute".Example test skeleton
#[cfg(test)] mod tests { use super::*; use std::sync::{Arc, Mutex, RwLock}; #[test] fn mutex_recovers_from_poison() { let m = Arc::new(Mutex::new(42)); let m2 = m.clone(); let _ = std::thread::spawn(move || { let _g = m2.lock().unwrap(); panic!("intentional"); }) .join(); assert!(m.lock().is_err(), "lock should be poisoned"); let guard = m.lock_or_recover(); assert_eq!(*guard, 42); } #[test] fn rwlock_read_recovers_from_poison() { let rw = Arc::new(RwLock::new(7)); let rw2 = rw.clone(); let _ = std::thread::spawn(move || { let _g = rw2.write().unwrap(); panic!("intentional"); }) .join(); assert!(rw.read().is_err()); let guard = rw.read_or_recover(); assert_eq!(*guard, 7); } #[test] fn rwlock_write_recovers_from_poison() { let rw = Arc::new(RwLock::new(0)); let rw2 = rw.clone(); let _ = std::thread::spawn(move || { let _g = rw2.write().unwrap(); panic!("intentional"); }) .join(); let mut guard = rw.write_or_recover(); *guard = 99; assert_eq!(*guard, 99); } }src/database/asset_lock_transaction.rs (1)
26-26: Acknowledge the trade-off for database connection poison recovery.Recovering from a poisoned mutex on the SQLite connection means a previous holder panicked mid-operation. SQLite auto-rolls-back uncommitted transactions on the connection, so this is generally safe. However, if any higher-level invariant was partially applied (e.g., two correlated writes where only one completed), the recovered connection will proceed against inconsistent application state.
For a GUI app where the alternative is a full crash, this trade-off is reasonable — just worth documenting in
lock_helper.rsthat callers should be aware of potential partial-update scenarios after recovery.src/backend_task/core/refresh_single_key_wallet_info.rs (1)
18-21: Inconsistent lock-handling strategies within the same function.Lines 19 and 57 use
.read()/.write().map_err(|e| e.to_string())?(propagating poison as anErr), while lines 25 and 34 useread_or_recover()(silently recovering from poison with a warning log). Both approaches avoid panics, but the different recovery semantics in one function could surprise future maintainers.This isn't a bug — the
.map_err()?sites were already non-panicking and outside the PR's scope — but consider aligning them for consistency in a follow-up.Also applies to: 25-25, 34-34
src/backend_task/core/create_asset_lock.rs (1)
38-39: Inconsistent error handling between lock sites in the same function.Lines 23 and 50 use
wallet.write().map_err(|e| e.to_string())?which propagates poison errors to the caller, while lines 38 and 44 use the silent-recovery helpers. This means a poisonedtransactions_waiting_for_finalityorcore_clientwill silently proceed (potentially broadcasting a transaction with corrupted state), but a poisonedwalletwill return an error.This is likely fine given the PR's intent (only replacing
.unwrap()calls), but worth noting:read_or_recover()oncore_client(line 44) means a poisoned RPC client will attempt to broadcast a financial transaction without the caller knowing the client may be in a bad state.Also applies to: 44-46
src/ui/identities/identities_screen.rs (1)
1169-1175: Avoid double‑lockingidentitiesinui().
Take a single snapshot and reuse it for the refresh button and the main render to keep the view consistent and reduce lock churn.♻️ Suggested refactor
- right_buttons.push(( - "Load Identity", - DesiredAppAction::AddScreenType(Box::new(ScreenType::AddExistingIdentity)), - )); - if !self.identities.lock_or_recover().is_empty() { - // Create a vec of RefreshIdentity(identity) DesiredAppAction for each identity - let backend_tasks: Vec<BackendTask> = self - .identities - .lock_or_recover() - .values() - .map(|qi| BackendTask::IdentityTask(IdentityTask::RefreshIdentity(qi.clone()))) - .collect(); - right_buttons.push(( - "Refresh", - DesiredAppAction::BackendTasks( - backend_tasks, - BackendTasksExecutionMode::Concurrent, - ), - )); - } + let identities_vec = { + let guard = self.identities.lock_or_recover(); + guard.values().cloned().collect::<Vec<_>>() + }; + + right_buttons.push(( + "Load Identity", + DesiredAppAction::AddScreenType(Box::new(ScreenType::AddExistingIdentity)), + )); + if !identities_vec.is_empty() { + let backend_tasks: Vec<BackendTask> = identities_vec + .iter() + .cloned() + .map(|qi| BackendTask::IdentityTask(IdentityTask::RefreshIdentity(qi))) + .collect(); + right_buttons.push(( + "Refresh", + DesiredAppAction::BackendTasks( + backend_tasks, + BackendTasksExecutionMode::Concurrent, + ), + )); + } @@ - let identities_vec = { - let guard = self.identities.lock_or_recover(); - guard.values().cloned().collect::<Vec<_>>() - }; + // identities_vec already computed aboveAlso applies to: 1196-1197
src/ui/identities/top_up_identity_screen/mod.rs (1)
99-111: Inconsistent locking pattern at Line 110.Line 99 and 104 use
read_or_recover(), but Line 110 still uses.read().ok()?. On a poisoned lock this silently falls back to"Select", which is benign, but inconsistent with the PR's stated goal of converting all lock sites. Consider aligning for consistency:Suggested diff
let selected_wallet_alias = self .wallet .as_ref() - .and_then(|wallet| wallet.read().ok()?.alias.clone()) + .and_then(|wallet| wallet.read_or_recover().alias.clone()) .unwrap_or_else(|| "Select".to_string());src/backend_task/identity/register_identity.rs (2)
393-406: Two separateread_or_recover()calls to getseed_hash— consider caching.
wallet.read_or_recover().seed_hash()is called at Line 393 and again at Line 404 with no mutation in between. The seed hash could be fetched once.Suggested diff
- let wallet_seed_hash = { wallet.read_or_recover().seed_hash() }; + let wallet_seed_hash = { + let r = wallet.read_or_recover(); + let hash = r.seed_hash(); + hash + }; let mut qualified_identity = QualifiedIdentity { identity: identity.clone(), ... private_keys: keys.to_key_storage(wallet_seed_hash), ... associated_wallets: BTreeMap::from([( - wallet.read_or_recover().seed_hash(), + wallet_seed_hash, wallet.clone(), )]),
850-876:get_platform_address_best_infonot converted to recoverable helpers.Lines 852–856 use
.read().inspect_err(...).ok()?and Line 860 uses.read().ok()?. While the fallback toNoneis safe, this is inconsistent with the PR's goal of standardizing on*_or_recover(). These sites will still fail silently on poisoned locks rather than recovering with a warning.Suggested diff
let wallets = self .wallets - .read() - .inspect_err(|e| tracing::error!(err=%e, "wallet lock poisoned")) - .ok()?; + .read_or_recover(); let mut recent_info: Option<AddressInfo> = None; for wallet in wallets.values() { - let wallet_guard = wallet.read().ok()?; + let wallet_guard = wallet.read_or_recover();
| let wallets = self.wallets.read_or_recover(); | ||
| for wallet_arc in wallets.values() { | ||
| let mut wallet = wallet_arc.write().unwrap(); | ||
| let mut wallet = wallet_arc.write_or_recover(); |
There was a problem hiding this comment.
Recovering a write lock on wallet data in a financial operation deserves caution.
In received_transaction_finality, both wallets.read_or_recover() (Line 1477) and wallet_arc.write_or_recover() (Line 1479) silently recover from poisoned locks. If a previous thread panicked mid-mutation of wallet data (UTXOs, balances), the recovered MutexGuard may expose partially-updated financial state. This method then proceeds to insert UTXOs and update balances, potentially compounding the inconsistency.
Consider using .write().map_err(|e| ...)? here to surface the error to the caller, since this function already returns Result. Same applies to received_asset_lock_finality at Lines 1603-1605.
🤖 Prompt for AI Agents
In `@src/context.rs` around lines 1477 - 1479, In received_transaction_finality
replace the silent poison-recover calls (wallets.read_or_recover() and
wallet_arc.write_or_recover()) with fallible lock acquisition that returns an
error to the caller (e.g. use wallets.read().map_err(|e| /* wrap to your error
type */)? and wallet_arc.write().map_err(|e| /* wrap */)?), so a poisoned mutex
surfaces as a Result error instead of silently continuing; apply the same change
pattern in received_asset_lock_finality for the analogous calls to ensure
poisoned locks are propagated rather than recovered.
| #[allow(dead_code)] // May be used for generating sequential top-up indices | ||
| pub fn get_next_top_up_index(&self, identity_id: &[u8]) -> rusqlite::Result<u64> { | ||
| let conn = self.conn.lock().unwrap(); | ||
| let conn = self.conn.lock_or_recover(); |
There was a problem hiding this comment.
Consider the implications of recovering a poisoned database connection mutex.
If the mutex was poisoned because a thread panicked mid-query or mid-transaction on the same rusqlite::Connection, the connection could be in an inconsistent state (e.g., an open transaction that was never committed or rolled back). Recovering the inner guard and issuing new queries on it may produce unexpected results or violate data integrity.
For non-DB locks (UI state, SDK handles), recovering is generally safe. For database connections, consider whether it's safer to keep the .lock().unwrap() (fail-fast) or at least reset the connection state (e.g., ROLLBACK on recovery) before proceeding.
🤖 Prompt for AI Agents
In `@src/database/top_ups.rs` at line 23, The code calls
self.conn.lock_or_recover() which can resume a poisoned rusqlite::Connection
mutex and leave the DB in an inconsistent state; change this to either use a
fail-fast lock (self.conn.lock().unwrap()) or, if you must keep
lock_or_recover(), immediately reset the connection state before running
queries: after obtaining the guard from lock_or_recover() run a ROLLBACK (or
otherwise ensure no open transaction) and check its result, and if rollback
fails recreate or reopen the rusqlite::Connection instead of proceeding; update
the use sites in top_ups.rs that reference conn.lock_or_recover() to follow this
pattern and handle errors accordingly.
|
We may actually not want this :) |
Summary
src/lock_helper.rswithMutexExtandRwLockExttraits that providelock_or_recover(),read_or_recover(), andwrite_or_recover()methods.lock().unwrap(),.read().unwrap(), and.write().unwrap()calls across 67 files with the poison-recovering alternativestracing::warn!and return the inner data instead of crashing the applicationMotivation
Lock poisoning panics are one of the most common crash vectors in multi-threaded Rust applications. In a GUI app like Dash Evo Tool, a panic on any background thread (e.g., network timeout, malformed data) poisons every shared lock it held. Without recovery, the next thread to access that lock crashes too, cascading into a full application crash.
The poisoned data is almost always still valid — the panicking thread rarely corrupts the protected state. By recovering with a warning log instead of panicking, the app stays usable and the user doesn't lose their session.
Changes
src/lock_helper.rs—MutexExt<T>andRwLockExt<T>extension traitsbackend_task/,context.rs,database/,ui/,spv/,app.rs— mechanical replacement of.lock().unwrap()→.lock_or_recover(),.read().unwrap()→.read_or_recover(),.write().unwrap()→.write_or_recover()Test plan
cargo buildpassescargo clippy --all-features --all-targets -- -D warningspasses cleancargo test --all-features --workspace)🤖 Generated with Claude Code
Summary by CodeRabbit