Skip to content

Commit 0aa2069

Browse files
committed
fix(sandbox): strip " (deleted)" suffix from unlinked /proc/<pid>/exe paths
When a running binary is unlinked from its filesystem path — the common case is a `docker cp` hot-swap of `/opt/openshell/bin/openshell-sandbox` during the `cluster-deploy-fast` dev upgrade workflow — the Linux kernel appends the literal string ` (deleted)` to the `/proc/<pid>/exe` readlink target. The tainted `PathBuf` then flows into `collect_ancestor_binaries` and on into `BinaryIdentityCache::verify_or_cache`, which tries to `std::fs::metadata` the path. `stat()` fails with `ENOENT` because the literal suffix isn't a real filesystem path, and the CONNECT proxy denies every outbound request with: ancestor integrity check failed for \ /opt/openshell/bin/openshell-sandbox (deleted): \ Failed to stat ...: No such file or directory (os error 2) Reproduced in production 2026-04-15: a cluster-deploy-fast-style hot-swap of the supervisor binary caused every pod whose PID 1 held the now-deleted inode to deny ALL outbound CONNECTs (slack.com, registry.npmjs.org, 169.254.169.254, etc.), breaking Slack REST delivery, npm installs, and IMDS probes simultaneously. Existing pre-hot-swap TCP tunnels (e.g. Slack Socket Mode WSS) kept working because they never re-evaluate the proxy. Strip the suffix in `binary_path()` so downstream callers see the clean, grep-friendly path. This aligns the cache key and log messages with the original on-disk location. Note: stripping the suffix does NOT by itself make the identity cache tolerant of a legitimate binary replacement — `verify_or_cache` will now `stat` and hash whatever currently lives at the stripped path, which is the NEW binary, and surface a clearer `Binary integrity violation` error. Fully unblocking the cluster-deploy-fast hot-swap workflow needs a follow-up that either (a) reads running-binary content from `/proc/<pid>/exe` directly via `File::open` (procfs resolves this to the live in-memory executable even when the original inode has been unlinked), or (b) keys the identity cache by exec dev+inode instead of path. Happy to send that as a separate PR once the approach is decided — filing this narrow fix first because it stands on its own: it fixes a concrete misleading error and unblocks the obvious next step. Added `binary_path_strips_deleted_suffix` test that copies `/bin/sleep` to a temp path, spawns a child from it, unlinks the temp binary, verifies the raw readlink contains the ` (deleted)` suffix, then asserts the public API returns the stripped path. Signed-off-by: mjamiv <michael.commack@gmail.com>
1 parent 355d845 commit 0aa2069

File tree

1 file changed

+83
-4
lines changed

1 file changed

+83
-4
lines changed

crates/openshell-sandbox/src/procfs.rs

Lines changed: 83 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,43 @@ use tracing::debug;
1919
/// `/proc/{pid}/cmdline` because `argv[0]` is trivially spoofable by any
2020
/// process and must not be used as a trusted identity source.
2121
///
22-
/// If this fails, ensure the proxy process has permission to read
23-
/// `/proc/<pid>/exe` (e.g. same user, or `CAP_SYS_PTRACE`).
22+
/// ### Unlinked binaries (`(deleted)` suffix)
23+
///
24+
/// When a running binary is unlinked from its filesystem path — the common
25+
/// case is a `docker cp` hot-swap of `/opt/openshell/bin/openshell-sandbox`
26+
/// during a `cluster-deploy-fast` dev upgrade — the kernel appends the
27+
/// literal string `" (deleted)"` to the `/proc/<pid>/exe` readlink target.
28+
/// The raw tainted path (e.g. `"/opt/openshell/bin/openshell-sandbox (deleted)"`)
29+
/// is not a real filesystem path: any downstream `stat()` fails with `ENOENT`.
30+
///
31+
/// We strip the suffix so callers see a clean, grep-friendly path suitable
32+
/// for cache keys and log messages. This does NOT claim the file at the
33+
/// stripped path is the same binary that the process is executing — the
34+
/// on-disk inode may now be arbitrary. Callers that need to verify the
35+
/// running binary's *contents* (for integrity checking) should read the
36+
/// magic `/proc/<pid>/exe` symlink directly via `File::open`, which procfs
37+
/// resolves to the live in-memory executable even when the original inode
38+
/// has been unlinked.
39+
///
40+
/// If the readlink itself fails, ensure the proxy process has permission
41+
/// to read `/proc/<pid>/exe` (e.g. same user, or `CAP_SYS_PTRACE`).
2442
#[cfg(target_os = "linux")]
2543
pub fn binary_path(pid: i32) -> Result<PathBuf> {
26-
std::fs::read_link(format!("/proc/{pid}/exe")).map_err(|e| {
44+
let link = format!("/proc/{pid}/exe");
45+
let target = std::fs::read_link(&link).map_err(|e| {
2746
miette::miette!(
2847
"Failed to read /proc/{pid}/exe: {e}. \
2948
Cannot determine binary identity — denying request. \
3049
Hint: the proxy may need CAP_SYS_PTRACE or to run as the same user."
3150
)
32-
})
51+
})?;
52+
53+
// Strip the " (deleted)" suffix the kernel appends for unlinked binaries.
54+
// See the doc comment above for the full rationale.
55+
if let Some(stripped) = target.to_str().and_then(|s| s.strip_suffix(" (deleted)")) {
56+
return Ok(PathBuf::from(stripped));
57+
}
58+
Ok(target)
3359
}
3460

3561
/// Resolve the binary path of the TCP peer inside a sandbox network namespace.
@@ -391,6 +417,59 @@ mod tests {
391417
assert!(path.exists());
392418
}
393419

420+
/// Verify that an unlinked binary's path is returned without the
421+
/// kernel's " (deleted)" suffix. This is the common case during a
422+
/// `docker cp` hot-swap of the supervisor binary — before this strip,
423+
/// callers that `stat()` the returned path get `ENOENT` and the
424+
/// ancestor integrity check in the CONNECT proxy denies every request.
425+
#[cfg(target_os = "linux")]
426+
#[test]
427+
fn binary_path_strips_deleted_suffix() {
428+
use std::os::unix::fs::PermissionsExt;
429+
430+
// Copy /bin/sleep to a temp path we control so we can unlink it.
431+
let tmp = tempfile::TempDir::new().unwrap();
432+
let exe_path = tmp.path().join("deleted-sleep");
433+
std::fs::copy("/bin/sleep", &exe_path).unwrap();
434+
std::fs::set_permissions(&exe_path, std::fs::Permissions::from_mode(0o755)).unwrap();
435+
436+
// Spawn a child from the temp binary, then unlink it while the
437+
// child is still running. The child keeps the exec mapping via
438+
// `/proc/<pid>/exe`, but readlink will now return the tainted
439+
// "<path> (deleted)" string.
440+
let mut child = std::process::Command::new(&exe_path)
441+
.arg("5")
442+
.spawn()
443+
.unwrap();
444+
let pid: i32 = child.id().cast_signed();
445+
std::fs::remove_file(&exe_path).unwrap();
446+
447+
// Sanity check: the raw readlink should contain " (deleted)".
448+
let raw = std::fs::read_link(format!("/proc/{pid}/exe"))
449+
.unwrap()
450+
.to_string_lossy()
451+
.into_owned();
452+
assert!(
453+
raw.ends_with(" (deleted)"),
454+
"kernel should append ' (deleted)' to unlinked exe readlink; got {raw:?}"
455+
);
456+
457+
// The public API should return the stripped path, not the tainted one.
458+
let resolved = binary_path(pid).expect("binary_path should succeed for deleted binary");
459+
assert_eq!(
460+
resolved, exe_path,
461+
"binary_path should strip the ' (deleted)' suffix"
462+
);
463+
let resolved_str = resolved.to_string_lossy();
464+
assert!(
465+
!resolved_str.contains("(deleted)"),
466+
"stripped path must not contain '(deleted)'; got {resolved_str:?}"
467+
);
468+
469+
let _ = child.kill();
470+
let _ = child.wait();
471+
}
472+
394473
#[cfg(target_os = "linux")]
395474
#[test]
396475
fn collect_descendants_includes_self() {

0 commit comments

Comments
 (0)