Add draft: OTA Firmware Updates for Matter Devices via nRF Cloud#629
Add draft: OTA Firmware Updates for Matter Devices via nRF Cloud#629
Conversation
Deploying interrupt with
|
| Latest commit: |
47b66f3
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://b6d2e55f.interrupt.pages.dev |
| Branch Preview URL: | https://matter-ota-nrfcloud.interrupt.pages.dev |
There was a problem hiding this comment.
Pull request overview
Adds a new draft blog post describing an approach for performing OTA firmware updates on Matter-over-Thread devices using CoAP blockwise transfers over DTLS with nRF Cloud as the backend, and MCUboot for safe image swapping.
Changes:
- Introduces a new draft post detailing an end-to-end OTA architecture (version check, blockwise download, flash streaming, MCUboot swap/confirm).
- Includes configuration snippets, illustrative CoAP request/Block2 examples, and a “Trying it out” walkthrough with sample logs.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot apply changes based on the comments in this thread |
Applied both changes in commit 5baa440:
|
noahp
left a comment
There was a problem hiding this comment.
we don't currently require it, but a nice rewrap with npx prettier --write --prose-wrap=always _drafts/matter-ota-nrfcloud.md would be great, to make it easier to review/read.
great post! ship it 🚢
_drafts/matter-ota-nrfcloud.md
Outdated
| with nRF Cloud as the OTA backend. | ||
| on the nRF54LM20 DK. |
_drafts/matter-ota-nrfcloud.md
Outdated
|
|
||
| `CONFIG_STREAM_FLASH` provides a buffered flash write API that handles page alignment for us. `CONFIG_IMG_MANAGER` and `CONFIG_MCUBOOT_IMG_MANAGER` give us the MCUboot APIs to request an image swap and confirm the new image after boot. | ||
|
|
||
| The nRF54LM20's partition layout places the MCUboot primary slot in internal RRAM (the nRF54LM20 uses RRAM, not traditional flash and the secondary slot on external SPI NOR flash (a MX25R6435F). This means the secondary slot erase is slow (~60 seconds for 2 MB), but the primary slot benefits from RRAM's fast write speeds. |
There was a problem hiding this comment.
| The nRF54LM20's partition layout places the MCUboot primary slot in internal RRAM (the nRF54LM20 uses RRAM, not traditional flash and the secondary slot on external SPI NOR flash (a MX25R6435F). This means the secondary slot erase is slow (~60 seconds for 2 MB), but the primary slot benefits from RRAM's fast write speeds. | |
| The nRF54LM20's partition layout places the MCUboot primary slot in internal RRAM (the nRF54LM20 uses RRAM, not traditional flash and the secondary slot on external SPI NOR flash (a MX25R6435F)). This means the secondary slot erase is slow (~60 seconds for 2 MB), but the primary slot benefits from RRAM's fast write speeds. |
There was a problem hiding this comment.
there's a parenthesis bug here, but not the one you caught ;-). Fixed.
_drafts/matter-ota-nrfcloud.md
Outdated
| int pos = coap_start_request(coap_buf, sizeof(coap_buf), | ||
| COAP_CODE_GET, 0); | ||
| coap_buf[0] = 0x48; /* Ver=1, Type=CON, TKL=8 */ | ||
| memcpy(coap_buf + 4, token, 8); | ||
| pos = 12; |
There was a problem hiding this comment.
nit- pos return value is overwritten here?
There was a problem hiding this comment.
Fixed it - this code isn't the best, I might take one more swing at it.
| nrfcloud_send(s, coap_buf, pos); | ||
|
|
||
| /* Receive response, filtering by token to skip stale packets */ | ||
| /* ... (token matching loop omitted for brevity) ... */ |
There was a problem hiding this comment.
maybe also note that error handling is omitted for brevity too, this loop will silently exit if all fail, and i think stream_flash_buffered_write will end up being called with junk
|
|
||
| `BOOT_UPGRADE_TEST` tells MCUboot to swap the images, but treat the new image as a **test**. If the new firmware does not explicitly confirm itself, MCUboot will revert to the previous image on the next reboot. This is the safety net for remote devices: a buggy firmware that crashes before confirming will be rolled back automatically. | ||
|
|
||
| The image is confirmed early in `main()`, before the application starts: |
There was a problem hiding this comment.
maybe note that it's unconditionally confirmed here, but could be done after the device has successfully established a connection, for example, in case a rollback is needed
_drafts/matter-ota-nrfcloud.md
Outdated
|
|
||
| Block2 over Thread is reliable but not fast. At ~4 KB/s, a 250 KB image takes about a minute. For typical firmware sizes this is fine, but very large images would benefit from a faster transport. | ||
|
|
||
| nRF Cloud provides the fleet management features that Matter's DCL lacks: staged rollouts, cohort targeting, and version management with a fast development loop. It offers a free tier for up to 10 devices, and scales at $0.10/device/month. |
There was a problem hiding this comment.
maybe- soften it to "check nRF Cloud's pricing page for current tiers." so we don't have to update in the future, if pricing changes
_drafts/matter-ota-nrfcloud.md
Outdated
| **An OTA backend that speaks UDP.** Our Matter device cannot use HTTP, so we need a backend that supports CoAP over DTLS. I am using nRF Cloud[^nrfcloud] because it provides this out of the box, along with firmware hosting, version management, staged rollouts, and cohort targeting. You could also roll your own. | ||
|
|
||
| **Application firmware** on the device that checks for updates, downloads the new image, and writes it to flash. This is the code we will walk through in this post. | ||
|
|
||
| **MCUboot**[^mcuboot] is the bootloader. It manages two firmware slots (primary and secondary) and can swap between them safely. If a new firmware fails to boot, MCUboot automatically reverts to the previous version. |
There was a problem hiding this comment.
nit for readability: bullets
| **An OTA backend that speaks UDP.** Our Matter device cannot use HTTP, so we need a backend that supports CoAP over DTLS. I am using nRF Cloud[^nrfcloud] because it provides this out of the box, along with firmware hosting, version management, staged rollouts, and cohort targeting. You could also roll your own. | |
| **Application firmware** on the device that checks for updates, downloads the new image, and writes it to flash. This is the code we will walk through in this post. | |
| **MCUboot**[^mcuboot] is the bootloader. It manages two firmware slots (primary and secondary) and can swap between them safely. If a new firmware fails to boot, MCUboot automatically reverts to the previous version. | |
| - **An OTA backend that speaks UDP.** Our Matter device cannot use HTTP, so we need a backend that supports CoAP over DTLS. I am using nRF Cloud[^nrfcloud] because it provides this out of the box, along with firmware hosting, version management, staged rollouts, and cohort targeting. You could also roll your own. | |
| - **Application firmware** on the device that checks for updates, downloads the new image, and writes it to flash. This is the code we will walk through in this post. | |
| - **MCUboot**[^mcuboot] is the bootloader. It manages two firmware slots (primary and secondary) and can swap between them safely. If a new firmware fails to boot, MCUboot automatically reverts to the previous version. |
| Now the full update - check, erase, download, and reboot: | ||
|
|
||
| ``` | ||
| uart:~$ nrfcloud ota |
There was a problem hiding this comment.
If you have the timestamps from the zephyr logs, that would be nice to have here so users can see the progression
There was a problem hiding this comment.
I'm too lazy to rerun it all :-P
There was a problem hiding this comment.
Very fair! A nice to have for sure
| [00:00:02.145,000] <inf> chip: [SVR]Server initializing... | ||
| ``` | ||
|
|
||
| The whole process takes about 5 minutes. |
There was a problem hiding this comment.
Could add a little celebration here! I also think it might feel better to have the time discussion here instead of the conclusion (I also added the timestamp callout, in case you add it):
| The whole process takes about 5 minutes. | |
| And we're done! 🎉 | |
| As the log timestamps show, the whole process takes about 5 minutes. Block2 over Thread is reliable but not fast. At ~4 KB/s, a 250 KB image takes about a minute. For typical firmware sizes this is fine, but very large images would benefit from a faster transport, such as HTTPS. |
| pos = coap_append_option(coap_buf, pos, sizeof(coap_buf), | ||
| &prev_opt, COAP_OPT_MEMFAULT_KEY, | ||
| project_key, strlen(project_key)); | ||
|
|
There was a problem hiding this comment.
This is an optimization, but the proxy endpoint will now honor response format requests, and plaintext URL would be cleaner to parse than json. You should be able to add this here with:
| const uint8_t content_format = 0; /* 0 = text/plain */ | |
| pos = coap_append_option(coap_buf, pos, sizeof(coap_buf), | |
| &prev_opt, COAP_OPT_CONTENT_FORMAT, | |
| &content_format, 1); |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agent-Logs-Url: https://github.com/memfault/interrupt/sessions/d43707dd-81d9-421a-9e40-bb36551e5000 Co-authored-by: franc0is <4192460+franc0is@users.noreply.github.com>
5ea722c to
47b66f3
Compare
OTA backend,→OTA backend.)[Memfault OTA documentation][^memfault_ota]→Memfault OTA documentation[^memfault_ota])