You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Additionally blocked in `Block` mode** (no proxy):
460
-
-`AF_INET`, `AF_INET6`
454
+
Seccomp provides three layers of syscall restriction: socket domain blocks, unconditional syscall blocks, and conditional syscall blocks. The filter uses a default-allow policy (`SeccompAction::Allow`) with targeted rules that return `Errno(EPERM)`.
461
455
462
456
**Skipped entirely** in `Allow` mode.
463
457
464
458
Setup:
465
459
1.`prctl(PR_SET_NO_NEW_PRIVS, 1)` -- required before seccomp
466
460
2.`seccompiler::apply_filter()` with default action `Allow` and per-rule action `Errno(EPERM)`
In `Proxy` mode, `AF_INET`/`AF_INET6` are allowed because the sandboxed process needs to connect to the proxy over the veth pair. The network namespace ensures it can only reach the proxy's IP (`10.200.0.1`).
469
474
475
+
#### Unconditional syscall blocks
476
+
477
+
These syscalls are blocked entirely (EPERM for any invocation):
|`ptrace`| Cross-process memory inspection and code injection |
483
+
|`bpf`| Kernel BPF program loading |
484
+
|`process_vm_readv`| Cross-process memory read |
485
+
|`io_uring_setup`| Async I/O subsystem with extensive CVE history |
486
+
|`mount`| Filesystem mount could subvert Landlock or overlay writable paths |
487
+
488
+
#### Conditional syscall blocks
489
+
490
+
These syscalls are only blocked when specific flag patterns are present:
491
+
492
+
| Syscall | Condition | Reason |
493
+
|---------|-----------|--------|
494
+
|`execveat`|`AT_EMPTY_PATH` flag set (arg4) | Fileless execution from an anonymous fd |
495
+
|`unshare`|`CLONE_NEWUSER` flag set (arg0) | User namespace creation enables privilege escalation |
496
+
|`seccomp`| operation == `SECCOMP_SET_MODE_FILTER` (arg0) | Prevents sandboxed code from replacing the active filter |
497
+
498
+
Conditional blocks use `MaskedEq` for flag checks (bit-test) and `Eq` for exact-value matches. This allows normal use of these syscalls while blocking the dangerous flag combinations.
Copy file name to clipboardExpand all lines: architecture/security-policy.md
+28-1Lines changed: 28 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -850,6 +850,10 @@ The response includes an `X-OpenShell-Policy` header and `Connection: close`. Se
850
850
851
851
## Seccomp Filter Details
852
852
853
+
The seccomp filter uses a default-allow policy (`SeccompAction::Allow`) with targeted rules that return `EPERM`. It provides three layers of protection: socket domain blocks, unconditional syscall blocks, and conditional syscall blocks. See `crates/openshell-sandbox/src/sandbox/linux/seccomp.rs`.
854
+
855
+
### Blocked socket domains
856
+
853
857
Regardless of network mode, certain socket domains are always blocked:
854
858
855
859
| Domain | Constant | Reason |
@@ -861,7 +865,30 @@ Regardless of network mode, certain socket domains are always blocked:
861
865
862
866
In proxy mode (which is always active), `AF_INET` (2) and `AF_INET6` (10) are allowed so the sandbox process can reach the proxy.
863
867
864
-
The seccomp filter uses a default-allow policy (`SeccompAction::Allow`) with specific `socket()` syscall rules that return `EPERM` when the first argument (domain) matches a blocked value. See `crates/openshell-sandbox/src/sandbox/linux/seccomp.rs`.
868
+
### Blocked syscalls
869
+
870
+
These syscalls are blocked unconditionally (EPERM for any invocation):
|`io_uring_setup`| 425 | Async I/O subsystem with extensive CVE history |
879
+
|`mount`| 165 | Filesystem mount could subvert Landlock or overlay writable paths |
880
+
881
+
### Conditionally blocked syscalls
882
+
883
+
These syscalls are blocked only when specific flag patterns are present in their arguments:
884
+
885
+
| Syscall | NR (x86-64) | Condition | Reason |
886
+
|---------|-------------|-----------|--------|
887
+
|`execveat`| 322 |`AT_EMPTY_PATH` (0x1000) set in flags (arg4) | Fileless execution from an anonymous fd |
888
+
|`unshare`| 272 |`CLONE_NEWUSER` (0x10000000) set in flags (arg0) | User namespace creation enables privilege escalation |
889
+
|`seccomp`| 317 | operation == `SECCOMP_SET_MODE_FILTER` (1) in arg0 | Prevents sandboxed code from replacing the active filter |
890
+
891
+
Flag checks use `MaskedEq` (`(arg & mask) == mask`) to detect the flag bit regardless of other bits. The `seccomp` syscall check uses `Eq` for exact value comparison on the operation argument.
0 commit comments