boot/nxboot: add flush barriers and CRC-validate primary before boot#3428
Merged
michallenc merged 3 commits intoapache:masterfrom Mar 18, 2026
Merged
Conversation
cederom
previously approved these changes
Mar 17, 2026
Contributor
cederom
left a comment
There was a problem hiding this comment.
Thank you @neilberkman good catch! :-)
xiaoxiang781216
previously approved these changes
Mar 18, 2026
Two hardening fixes for nxboot power-loss resilience:
1. Add flash_partition_flush() calls between critical partition
operations in perform_update(). Without explicit flush barriers,
writes may remain buffered in RAM (e.g. via FTL rwbuffer) when
nxboot proceeds to the next phase. A power loss between phases
can leave the recovery image uncommitted while the staging
partition has already been consumed.
Flush points added:
- After copy_partition(primary, recovery) completes
- After copy_partition(update, primary) completes, before
erasing the staging first sector
2. Replace validate_image_header() with validate_image() in the
final primary validation path of nxboot_perform_update(). The
header-only check validates magic and platform identifier but
does not CRC-check the image body. After an interrupted update,
a corrupt primary with an intact header would pass this check
and be booted, resulting in a persistent boot failure.
Signed-off-by: Neil Berkman <neil@xuku.com>
8dbc177
6e81e34 to
8dbc177
Compare
Contributor
Author
|
Force-pushed to fix a build error on arm-13: Update: new failures seem to be CI flakiness. |
xiaoxiang781216
previously approved these changes
Mar 18, 2026
michallenc
requested changes
Mar 18, 2026
The comment previously stated CRC was not calculated before boot. This is no longer accurate after adding full image CRC validation in validate_image(). Signed-off-by: Neil Berkman <neil@xuku.com>
225cc54 to
66c9805
Compare
michallenc
reviewed
Mar 18, 2026
The header variable in nxboot_perform_update() is no longer used after validate_image() was changed to take only the fd. Signed-off-by: Neil Berkman <neil@xuku.com>
michallenc
approved these changes
Mar 18, 2026
Contributor
michallenc
left a comment
There was a problem hiding this comment.
Also tested both kernel and apps patches on SAMv7, everything works fine, thanks!
cederom
approved these changes
Mar 18, 2026
Contributor
|
@michallenc when all is set please do the honors (merge) :-) :-) |
acassis
approved these changes
Mar 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two hardening fixes for nxboot power-loss resilience:
Flush barriers between critical partition operations — Add flash_partition_flush() calls after copy_partition() completes in perform_update(). Without explicit barriers, writes may remain buffered in RAM when nxboot proceeds to the next phase. A power loss between phases can leave the recovery image uncommitted while the staging partition has already been consumed.
Full CRC validation before booting primary — Replace validate_image_header() with validate_image() in the final primary validation path of nxboot_perform_update(). The header-only check does not CRC-check the image body. After an interrupted update, a corrupt primary with an intact header would pass this check and be booted.
Impact
Testing
Tested with Renode emulation fault injection on nucleo-h743zi nxboot. The flush barriers and CRC validation together eliminate the persistent boot failure observed when FTL write buffering is enabled (92/94 failure rate reduced to 0/94).