Conversation
The supervisor blindly CONTINUEd any syscall on an FD not in the virtual
table, letting a guest read/write host FDs obtained via openat TOCTOU.
Return EBADF for every untracked FD while tracking all legitimate
host-passthrough FDs.
FD table: eliminate the [1024, 32768) gap with mid_fds[31744]. Move
fd_table to static storage. Convert all inline low_fds[]/entries[]
lookups to fd_table_entry().
Tracking: pre-register stdio; track pipe FDs via ADDFD; intercept
eventfd/timerfd_create/epoll_create1 (create in supervisor, inject via
ADDFD); scan /proc/{pid}/fd on first notification to capture inherited
FDs; emulate /proc /sys /dev /TTY opens (supervisor opens the file
translating /proc/self, injects via ADDFD); emulate
dup/dup2/dup3/fcntl(F_DUPFD) via pidfd_getfd; allow execve on /dev/fd/N
paths.
Enforcement: fd_should_deny_io() in every I/O handler. Move
epoll_ctl/wait, timerfd_settime/gettime, fstatfs from CONTINUE table to
gated handlers. Preserve SHADOW_ONLY entries on close (shared table
after fork); skip them in close_cloexec_entry.
Change-Id: I34eaba6cfcb181250ecd95998d2e530af949915e
CONTINUE re-reads pointer targets from guest memory, so a sibling thread can swap the path between process_vm_readv and host kernel's re-read. Replace every path-dependent CONTINUE with supervisor-side emulation using local path copies and pidfd_getfd dirfd copies. Emulated operations (supervisor calls host syscall): fstatat, statx, stat, faccessat, readlinkat, symlinkat, linkat, utimensat, mkdirat, unlinkat, fchmodat, fchownat, renameat2, and legacy wrappers (access, mkdir, unlink, rmdir, chmod, chown). Extract translate_proc_self() helper shared across all paths. forward_execve: EACCES for virtual paths (not executable). forward_clone3: write validated flags back before CONTINUE. After this commit no should_continue_virtual_path or should_continue_for_dirfd call results in CONTINUE. Close #40 Change-Id: I1653e63ebd63f3e0a4428b24de051dfb05c769b1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The CONTINUE escape where a guest could exploit the supervisor's blind passthrough of untracked FDs and pointer-dependent syscalls.
Close #40
Summary by cubic
Hardened the seccomp supervisor by denying I/O on untracked FDs, replacing all path/pointer-dependent CONTINUEs with supervisor-side host syscalls, and fully tracking host-passthrough FDs (stdio, pipes,
eventfd/timerfd/epoll, inherited). Fixes bind/connect EINVAL from stale socket entries and closes the CONTINUE escape.Bug Fixes
fd_should_deny_io(incl.mmap).readlinkat,statx,faccessat*, etc.epoll_ctl/wait,timerfd_*, andfstatfswithout CONTINUE; validate args.clone3flags back before continue; allowexecveon/dev/fd/N; reject virtual exec paths with EACCES.SHADOW_ONLYentries on ADDFD socket injection (fixes bind/connect); preserveSHADOW_ONLYon close.Refactors
KBOX_LKL_FD_SHADOW_ONLY; store table statically; unify lookups viafd_table_entry()./proc/<pid>/fdscan for inherited FDs; track pipes via ADDFD; create/injecteventfd,timerfd,epoll; emulate opens under/proc,/sys,/dev, and TTY; emulatedup/dup2/dup3andfcntl(F_DUPFD*)viapidfd_getfd.pipe2,F_DUPFD*, and mid-range cases.Written for commit 0000182. Summary will update on new commits.