Skip to content

iovec structures: Fix TOCTOU vulnerabilities#2148

Open
francescolavra wants to merge 4 commits intomasterfrom
fix/iovec
Open

iovec structures: Fix TOCTOU vulnerabilities#2148
francescolavra wants to merge 4 commits intomasterfrom
fix/iovec

Conversation

@francescolavra
Copy link
Copy Markdown
Member

This change set fixes some TOCTOU vulnerabilities present in code that handles syscalls involving struct iovec arrays.
User memory validation via validate_iovec() and validate_msghdr() is being replaced with validation at the time of use.
In addition, this fixes a double read of the SQE opcode from user memory when handling the io_uring_enter syscall.
To minimize runtime overhead, validate_user_memory() is being replaced with a more lightweight memory_is_user() function
whose only purpose is to ensure that a memory range will fault when accessed if it is not a userspace address range.

Security issues reported by Niklas Femerstrand (@niklasfemerstrand).

Francesco Lavra added 4 commits March 29, 2026 11:19
iour_submit() reads sqe->opcode from shared user memory twice in two
different switch statements; a concurrent thread can race NOP (no fd
required) on the first read to READV (fd required) on the second, causing
a kernel NULL pointer dereference crash.

Copy sqe->opcode to local variable and only use the copied value.
A subsequent commit will need the context error frame to be set to access
other data (beside the destination address) from user memory.
This function is returning -EINVAL because Linux returns -EINVAL when the
pointer is NULL. But Linux returns -EFAULT for any other invalid pointer
value, and the -EINVAL return code for a NULL pointer is likely a vestigial
behavior without technical justification (it has been there since at least
2005, in the pre-git era).
Instead of mocking Linux behavior, just return the standard -EFAULT code
(as documented in the Linux man page), and use a different invalid pointer
value in the relevant runtime test to keep the test pass on both Nanos and
Linux.
The io_uring IORING_REGISTER_BUFFERS operation validates the user-supplied
iovec array via validate_iovec(), then copies from the same user memory via
runtime_memcpy() after a heap allocation and lock acquisition; a concurrent
thread can modify iov_base between validation and copy, replacing a valid
user address with a kernel address; subsequent READ_FIXED/WRITE_FIXED
operations then read from or write to arbitrary kernel memory.
In addition, the iov_op() function stores the user-supplied iovec pointer
after initial validation, and the iov_op_each() function later re-reads
iov[curr].iov_base from user memory, allowing a concurrent thread to
substitute a kernel address after validation; this affects both io_uring
and regular readv/writev paths, and enables arbitrary kernel read/write.

To prevent these TOCTOU vulnerabilities, replace initial user memory
validation via validate_iovec() and validate_msghdr() with validation at
the time of use. To minimize runtime overhead, replace
validate_user_memory() with a more lightweight memory_is_user() function
whose only purpose is to ensure that a memory range will fault when
accessed if it is not a userspace address range; this last change has the
side effect (which prompted a change in the Unix socket runtime test) that
NULL pointers passed to syscalls may be recognized as invalid addresses
only at the time of use (instead of during initial validation of syscall
arguments); this is expected, and is justified by the fact that a NULL
pointer may be a valid user address if the user program specifies an
'mmap_min_addr' manifest option value set to 0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant