Skip to content

Device Sync Updates#3053

Merged
codabrink merged 31 commits intomainfrom
coda/manual-sync
Jan 28, 2026
Merged

Device Sync Updates#3053
codabrink merged 31 commits intomainfrom
coda/manual-sync

Conversation

@codabrink
Copy link
Contributor

@codabrink codabrink commented Jan 23, 2026

Add device sync APIs and processing, switch clients to device sync groups, and require server URL with PIN-based archives across Rust core and WASM/Node/mobile bindings

Introduce explicit device sync flows with PIN-based archives and server URL handling, add message attempt/state tracking with a 3-attempt cap, replace history sync with device sync in client methods and tests, and expose device sync operations in WASM, Node, and mobile bindings. Core updates include new listing/processing by PIN, paging of sync messages, refined retryability for DeviceSyncError, and proto changes adding server_url.

📍Where to Start

Start with device sync message flow in SyncWorker—review process_message, send_archive, send_sync_archive, process_archive_with_pin, and list_available_archives in worker.rs, then follow the new DB paging/state APIs in processed_device_sync_messages.rs and the client entrypoints in client.rs.

Changes since #3053 opened

  • Replaced inner_client() method calls with direct inner_client field access across all DeviceSync methods [ab0536e]
  • Replaced Client import with RustXmtpClient and added std::sync::Arc import in the bindings.wasm.device_sync module [ab0536e]
  • Added explicit discriminant values to BackupElementSelectionOption enum [ab0536e]
  • Reordered import statements in the bindings/wasm module [70446f5]

📊 Macroscope summarized a386e3b. 14 files reviewed, 12 issues evaluated, 12 issues filtered, 0 comments posted

🗂️ Filtered Issues

bindings/mobile/src/mls/device_sync/mod.rs — 0 comments posted, 4 evaluated, 4 filtered
  • line 74: The create_archive function returns BackupMetadata (the internal type from xmtp_archive) instead of FfiBackupMetadata. This is inconsistent with archive_metadata which returns FfiBackupMetadata. If this function is exported via UniFFI, BackupMetadata lacks the #[derive(uniffi::Record)] attribute and cannot be serialized for FFI. The return type should likely be FfiBackupMetadata with a conversion: Ok(BackupMetadata::from_metadata_save(metadata, BACKUP_VERSION).into()) [ Already posted ]
  • line 171: The check_key function uses a hardcoded value 32 for the minimum key length check, but uses ENC_KEY_SIZE for truncation and in the error message. If ENC_KEY_SIZE differs from 32, this creates inconsistent behavior: either the validation allows keys that are too short (if ENC_KEY_SIZE > 32), or rejects valid keys that meet the actual requirement (if ENC_KEY_SIZE < 32). The check on line 171 should use ENC_KEY_SIZE instead of the hardcoded 32. [ Already posted ]
  • line 198: The From<BackupMetadata> for FfiBackupMetadata conversion at line 198 uses filter_map(|selection| selection.try_into().ok()) which silently drops any BackupElementSelection values that fail to convert to FfiBackupElementSelection. When users call archive_metadata to inspect an archive, they may receive an incomplete list of elements with no indication that some were dropped, potentially leading to confusion about archive contents. [ Low confidence ]
  • line 198: The filter_map(|selection| selection.try_into().ok()) pattern silently discards any BackupElementSelection elements that fail conversion to FfiBackupElementSelection. This could cause data loss where the caller receives an incomplete elements list without any indication that elements were dropped. If certain element types are unsupported in the FFI layer, this should either error out or be explicitly documented/logged. [ Already posted ]
bindings/node/src/device_sync.rs — 0 comments posted, 5 evaluated, 5 filtered
  • line 27: In the From<ArchiveOptions> for BackupOptions implementation, n.get_i64().0 silently discards the overflow indicator boolean. When JavaScript passes a BigInt value larger than i64::MAX or smaller than i64::MIN, get_i64() returns (truncated_value, false) where false indicates the conversion was lossy. The code ignores this and uses the potentially incorrect truncated value, which could silently corrupt start_ns and end_ns timestamp boundaries, causing the archive to include wrong data ranges. [ Already posted ]
  • line 27: The BigInt::get_i64() method returns a tuple (i64, bool) where the boolean indicates whether the conversion was lossless. The code uses .0 to extract only the i64 value, ignoring the overflow indicator. If a BigInt value exceeds i64::MAX or is below i64::MIN, the timestamp will be silently truncated to an incorrect value, corrupting the start_ns or end_ns filter and potentially causing wrong data to be archived. [ Already posted ]
  • line 28: Same silent truncation issue for end_ns - BigInt values outside i64 range will be silently converted to incorrect values without any error or warning to the caller. [ Already posted ]
  • line 91: The conversion from BackupMetadata to ArchiveMetadata silently drops any BackupElementSelection elements that fail to convert via .filter_map(|selection| selection.try_into().ok()). If any element's try_into() fails, it is silently discarded rather than propagating an error or logging a warning. This could result in data loss where callers receive an incomplete elements list without any indication that some elements were dropped during conversion. [ Already posted ]
  • line 120: The key length check uses a hardcoded value 32 while the error message and truncate call use ENC_KEY_SIZE. If ENC_KEY_SIZE is not 32, the check and the error message would be inconsistent, and keys could be rejected or accepted incorrectly. The comparison on line 120 should use ENC_KEY_SIZE instead of the magic number 32 to ensure consistency. [ Already posted ]
bindings/wasm/src/device_sync.rs — 0 comments posted, 3 evaluated, 3 filtered
  • line 105: The From<BackupMetadata> implementation at lines 102-106 uses filter_map(|selection| selection.try_into().ok()) which silently drops any BackupElementSelection variants that fail conversion. When archive_metadata is called, the returned ArchiveMetadata.elements may contain fewer elements than the actual archive contains, giving callers an incomplete view of the archive contents without any indication that data was omitted. [ Already posted ]
  • line 105: The filter_map(|selection| selection.try_into().ok()) pattern silently discards any BackupElementSelection elements that fail conversion to BackupElementSelectionOption. This could cause data loss where the caller receives an incomplete elements list without any indication that elements were dropped. If certain element types are unsupported in the WASM layer, this should either error out or be explicitly documented/logged. [ Already posted ]
  • line 138: The length check uses hardcoded 32 but the error message and truncation use ENC_KEY_SIZE. If ENC_KEY_SIZE is greater than 32, keys with length between 32 and ENC_KEY_SIZE will pass the validation check but truncate(ENC_KEY_SIZE) will have no effect, returning a key shorter than ENC_KEY_SIZE. This could cause cryptographic failures downstream when the code expects exactly ENC_KEY_SIZE bytes. [ Already posted ]

@claude
Copy link

claude bot commented Jan 23, 2026

Claude finished @codabrink's task —— View job


Code Review Complete

Review Date: January 28, 2026, 6:54 PM UTC

I've reviewed this PR focusing on areas not already covered by previous reviewers. Here are my findings on the most important issues:


Critical Issues

1. Hardcoded key size check inconsistency

Files: bindings/mobile/src/mls/device_sync/mod.rs:171, bindings/node/src/device_sync.rs:120, bindings/wasm/src/device_sync.rs:138

All three bindings use hardcoded 32 in the key validation check instead of ENC_KEY_SIZE:

if key.len() < 32 {  // Should be: if key.len() < ENC_KEY_SIZE {

While ENC_KEY_SIZE is currently 32, using the constant ensures consistency if this value ever changes. This was already noted by macroscopeapp but remains unfixed.


High-Priority Issues

2. Silent data loss in metadata conversion

Files: All three bindings (mobile/mod.rs:198, node/device_sync.rs:91, wasm/device_sync.rs:105)

The filter_map(|selection| selection.try_into().ok()) pattern silently drops backup elements that fail conversion. If new element types are added to the proto but not yet supported in bindings, users will receive incomplete metadata without any indication:

elements: value
    .elements
    .into_iter()
    .filter_map(|selection| selection.try_into().ok())  // Drops unknown elements silently
    .collect(),

Recommendation: Either log a warning when elements are dropped, or document this behavior explicitly in the function documentation.


3. Message processing retry logic needs clarification

File: crates/xmtp_mls/src/groups/device_sync/worker.rs:304-318

The error handling increments attempts on error but marks as processed on success. However, the is_retryable() implementation (line 139-148) returns true for most errors. This means:

  • Non-retryable errors like AlreadyAcknowledged will still be retried up to 3 times
  • The is_retryable() check isn't actually used in the retry decision

Current flow:

if let Err(err) = self.process_message(handle, &msg, content).await {
    // Increments attempts regardless of is_retryable()
    self.context.db().increment_device_sync_msg_attempt(&msg.id, MAX_ATTEMPTS)?;
} else {
    self.context.db().mark_device_sync_msg_as_processed(&msg.id)?;
}

Recommendation: Either use is_retryable() to determine whether to increment attempts, or remove it if all errors should be retried up to MAX_ATTEMPTS.


4. Request acknowledgement without checking if reply was requested

File: crates/xmtp_mls/src/groups/device_sync/worker.rs:635-657

In process_archive_with_pin(), when pin is None, the code processes the first reply found without checking if it was requested by this installation:

let reply = match (pin, content) {
    (None, ContentProto::Reply(reply)) => reply,  // No check here
    (Some(pin), ContentProto::Reply(reply)) if reply.request_id == pin => reply,
    _ => continue,
};
return self.process_archive(&msg, reply).await;

This differs from the normal flow (line 385) which calls is_reply_requested_by_installation(). This could cause an installation to process archives intended for other devices.


Medium-Priority Issues

5. list_available_archives breaks early on old messages

File: crates/xmtp_mls/src/groups/device_sync/worker.rs:677-678

The function uses break 'outer when encountering a message older than the cutoff:

if msg.sent_at_ns < cutoff {
    break 'outer;  // Stops entirely, may miss newer messages
}

Since messages are fetched in descending order (line 178: order_by(sent_at_ns.desc())), this is actually correct. However, the comment about "oldest-to-newest" in a previous review was misleading. The current implementation is fine.


6. Documentation comment accuracy

File: crates/xmtp_db/src/encrypted_store/processed_device_sync_messages.rs:68

The documentation says "Number of processing attempts remaining" but the field tracks attempts made (starts at 0, increments on retry). This was noted by macroscopeapp and is still present.


Test Coverage Observations

The codecov report shows 55.28% patch coverage with significant gaps:

  • bindings/node/src/device_sync.rs: 0% coverage (149 lines missing)
  • bindings/mobile/src/mls/device_sync/mod.rs: 36.64% coverage (83 lines missing)

The core Rust implementation has good coverage (84-91%), but the bindings need more tests. Consider adding integration tests for the bindings, especially for error cases like:

  • Invalid key lengths
  • Missing/invalid PINs
  • Archive processing failures

Positive Notes

  • The retry mechanism with attempt tracking is well-designed
  • The use of upsert patterns in the DB layer prevents race conditions
  • Error types are comprehensive and properly structured
  • The pagination implementation for sync messages is correct and efficient

Summary: The core functionality is solid, but there are a few consistency issues in the bindings and one potential logic issue with archive processing. The most critical items are the hardcoded key size checks and the missing validation when processing archives without a PIN.


@codecov
Copy link

codecov bot commented Jan 23, 2026

Codecov Report

❌ Patch coverage is 55.28846% with 279 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.77%. Comparing base (eb5a815) to head (70446f5).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
bindings/node/src/device_sync.rs 0.00% 149 Missing ⚠️
bindings/mobile/src/mls/device_sync/mod.rs 36.64% 83 Missing ⚠️
.../encrypted_store/processed_device_sync_messages.rs 91.07% 19 Missing ⚠️
crates/xmtp_mls/src/groups/device_sync/worker.rs 84.15% 16 Missing ⚠️
crates/xmtp_mls/src/tasks.rs 58.33% 5 Missing ⚠️
bindings/node/src/client/mod.rs 0.00% 2 Missing ⚠️
crates/xmtp_mls/src/groups/device_sync/mod.rs 0.00% 2 Missing ⚠️
bindings/mobile/src/mls.rs 0.00% 1 Missing ⚠️
bindings/node/src/conversations/mod.rs 0.00% 1 Missing ⚠️
crates/xmtp_mls/src/groups/error.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3053      +/-   ##
==========================================
- Coverage   73.92%   73.77%   -0.15%     
==========================================
  Files         445      447       +2     
  Lines       54553    54990     +437     
==========================================
+ Hits        40330    40571     +241     
- Misses      14223    14419     +196     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

tracing::info!("Inspecting sync payload.");

// Check if this reply was asked for by this installation.
if !self.is_reply_requested_by_installation(&reply).await? {
Copy link
Contributor Author

@codabrink codabrink Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this check higher in the stack. This function assumes that you've already checked whether or not you want to import it.

}

/// Import a previous archive from file.
pub async fn import_archive(&self, path: String, key: Vec<u8>) -> Result<(), GenericError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of returning a report of sorts on successful import?

something like this:

pub struct ArchiveImportReport {
  elements_imported: u64 // or usize?
}

not sure if anything else would make sense, but the purpose would be to see if any elements weren't imported from the archive. i know it's possible that archives would overlap on some elements so this would allow some insight into that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Follow-up PR? This PR is getting kind of big.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sure!

Copy link
Collaborator

@rygine rygine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bindings look good to me!

@codabrink codabrink enabled auto-merge (squash) January 28, 2026 18:54
@codabrink codabrink merged commit fb6adfb into main Jan 28, 2026
24 of 25 checks passed
@codabrink codabrink deleted the coda/manual-sync branch January 28, 2026 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants