iovec structures: Fix TOCTOU vulnerabilities#2148
Open
francescolavra wants to merge 4 commits intomasterfrom
Open
iovec structures: Fix TOCTOU vulnerabilities#2148francescolavra wants to merge 4 commits intomasterfrom
francescolavra wants to merge 4 commits intomasterfrom
Conversation
added 4 commits
March 29, 2026 11:19
iour_submit() reads sqe->opcode from shared user memory twice in two different switch statements; a concurrent thread can race NOP (no fd required) on the first read to READV (fd required) on the second, causing a kernel NULL pointer dereference crash. Copy sqe->opcode to local variable and only use the copied value.
A subsequent commit will need the context error frame to be set to access other data (beside the destination address) from user memory.
This function is returning -EINVAL because Linux returns -EINVAL when the pointer is NULL. But Linux returns -EFAULT for any other invalid pointer value, and the -EINVAL return code for a NULL pointer is likely a vestigial behavior without technical justification (it has been there since at least 2005, in the pre-git era). Instead of mocking Linux behavior, just return the standard -EFAULT code (as documented in the Linux man page), and use a different invalid pointer value in the relevant runtime test to keep the test pass on both Nanos and Linux.
The io_uring IORING_REGISTER_BUFFERS operation validates the user-supplied iovec array via validate_iovec(), then copies from the same user memory via runtime_memcpy() after a heap allocation and lock acquisition; a concurrent thread can modify iov_base between validation and copy, replacing a valid user address with a kernel address; subsequent READ_FIXED/WRITE_FIXED operations then read from or write to arbitrary kernel memory. In addition, the iov_op() function stores the user-supplied iovec pointer after initial validation, and the iov_op_each() function later re-reads iov[curr].iov_base from user memory, allowing a concurrent thread to substitute a kernel address after validation; this affects both io_uring and regular readv/writev paths, and enables arbitrary kernel read/write. To prevent these TOCTOU vulnerabilities, replace initial user memory validation via validate_iovec() and validate_msghdr() with validation at the time of use. To minimize runtime overhead, replace validate_user_memory() with a more lightweight memory_is_user() function whose only purpose is to ensure that a memory range will fault when accessed if it is not a userspace address range; this last change has the side effect (which prompted a change in the Unix socket runtime test) that NULL pointers passed to syscalls may be recognized as invalid addresses only at the time of use (instead of during initial validation of syscall arguments); this is expected, and is justified by the fact that a NULL pointer may be a valid user address if the user program specifies an 'mmap_min_addr' manifest option value set to 0.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change set fixes some TOCTOU vulnerabilities present in code that handles syscalls involving struct iovec arrays.
User memory validation via validate_iovec() and validate_msghdr() is being replaced with validation at the time of use.
In addition, this fixes a double read of the SQE opcode from user memory when handling the io_uring_enter syscall.
To minimize runtime overhead, validate_user_memory() is being replaced with a more lightweight memory_is_user() function
whose only purpose is to ensure that a memory range will fault when accessed if it is not a userspace address range.
Security issues reported by Niklas Femerstrand (@niklasfemerstrand).