-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[1.4] Better errors from runc init
#5040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
lifubang
merged 5 commits into
opencontainers:release-1.4
from
cyphar:1.4-better-init-errors-4928
Nov 27, 2025
Merged
[1.4] Better errors from runc init
#5040
lifubang
merged 5 commits into
opencontainers:release-1.4
from
cyphar:1.4-better-init-errors-4928
Nov 27, 2025
+161
−118
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Member
|
Could you please also include this one: #4951? |
Member
Author
|
Will do, I thought they were already backported. |
100f783 to
32a7907
Compare
lifubang
approved these changes
Nov 26, 2025
kolyshkin
approved these changes
Nov 27, 2025
Since sane_kill after a failed read or write, but before reporting the error from that read or write, it may change the errno value in case kill(2) fails. Save and restore the errno around the call to kill. While at it, - change the code to return early; - don't return kill return value as no one is using it, and the errno value no longer correlates. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> (cherry picked from commit 9c8f476) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
We use bail to report fatal errors, and bail always append %m (aka strerror(errno)). In case an error condition did not set errno, the log message will end up with ": Success" or an error from a stale errno value. Either case is confusing for users. Introduce bailx which is the same as bail except it does not append %m, and use it where appropriate. The naming follows libc's err(3) and errx(3). PS we still use bail in a few cases after read or write, even if that read/write did not return an error, because the code does not distinguish between short read/write and error (-1). This will be addressed by the next commit. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> (cherry picked from commit 067b833) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Add a few missing sane_kill calls where they make sense. Remove one useless sane_kill of stage2_pid, as during SYNC_USERMAP stage2 is not yet started. It is harmless yet it makes the code slightly harder to read. Set the child pid to -1 upon receiving SYNC_CHILD_FINISH to minimize the chances of killing an unrelated process. When a child sends SYNC_CHILD_FINISH it is about to exit (although theoretically it could be stuck during debug logging). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> (cherry picked from commit aea52d0) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Introduce and use iobail, xread, and xwrite wrappers so that we can properly check read/write return value and call either bail or bailx on error, with proper diagnostics (distinguishing failed read/write from a short read/write). This prevents the "Success" prefix in errors like: failed to sync with stage-1: next state: Success Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> (cherry picked from commit 6c18b25) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
In case early stage of runc init (nsenter) fails for some reason, it logs error(s) with FATAL log level, via bail(). The runc init log is read by a parent (runc create/run/exec) and is logged via normal logrus mechanism, which is all fine and dandy, except when `runc init` fails, we return the error from the parent (which is usually not too helpful, for example): runc run failed: unable to start container process: can't get final child's PID from pipe: EOF Now, the actual underlying error is from runc init and it was logged earlier; here's how full runc output looks like: FATA[0000] nsexec-1[3247792]: failed to unshare remaining namespaces: No space left on device FATA[0000] nsexec-0[3247790]: failed to sync with stage-1: next state ERRO[0000] runc run failed: unable to start container process: can't get final child's PID from pipe: EOF The problem is, upper level runtimes tend to ignore everything except the last line from runc, and thus error reported by e.g. docker is not very helpful. This patch tries to improve the situation by collecting FATAL errors from runc init and appending those to the error returned (instead of logging). With it, the above error will look like this: ERRO[0000] runc run failed: unable to start container process: can't get final child's PID from pipe: EOF; runc init error(s): nsexec-1[141549]: failed to unshare remaining namespaces: No space left on device; nsexec-0[141547]: failed to sync with stage-1: next state Yes, it is long and ugly, but at least the upper level runtime will report it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> (cherry picked from commit f944cce) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
32a7907 to
f1d0dd8
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #4951 and #4928.
In case early stage of runc init (nsenter) fails for some reason, it
logs error(s) with FATAL log level, via bail().
The runc init log is read by a parent (runc create/run/exec) and is
logged via normal logrus mechanism, which is all fine and dandy, except
when
runc initfails, we return the error from the parent (which isusually not too helpful, for example):
Now, the actual underlying error is from runc init and it was logged
earlier; here's how full runc output looks like:
The problem is, upper level runtimes tend to ignore everything except
the last line from runc, and thus error reported by e.g. docker is not
very helpful.
This patch tries to improve the situation by collecting FATAL errors
from runc init and appending those to the error returned (instead of
logging). With it, the above error will look like this:
Yes, it is long and ugly, but at least the upper level runtime will report it.
Fixes: #4905