Skip to content

Conversation

@lovesegfault
Copy link
Member

@lovesegfault lovesegfault commented Oct 22, 2025

Motivation

This adds support for multipart uploads to the new s3 implementation, which is required for uploading files larger than 5GiB.

It's a big change, the commits will be pulled out into individual PRs, but this is here for early review of the big picture.

Context

Fixes: #12671


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added the store Issues and pull requests concerning the Nix store label Oct 22, 2025
@lovesegfault lovesegfault force-pushed the nix-s3-multipart branch 2 times, most recently from b179028 to 617ebc6 Compare October 24, 2025 22:07
@lovesegfault lovesegfault force-pushed the nix-s3-multipart branch 2 times, most recently from 9bdac08 to 9415bfc Compare October 25, 2025 00:09
@lovesegfault lovesegfault force-pushed the nix-s3-multipart branch 2 times, most recently from 1e1800a to b9a7cb6 Compare October 25, 2025 03:42
@lovesegfault lovesegfault force-pushed the nix-s3-multipart branch 2 times, most recently from 86426bf to 78afd35 Compare October 27, 2025 15:30
@arianvp
Copy link
Member

arianvp commented Oct 30, 2025

Urgh sorry for convincing everyone we could get away without implementing multipart uploads.

@xokdvium
Copy link
Contributor

Urgh sorry for convincing everyone we could get away without implementing multipart uploads.

No big deal. IMO having a single store implementation is still much more beneficial. We now also have constant memory both for uploads in http store and constant memory downloads. Previously HTTP store would buffer everything on uploads and S3 - for downloads. Consolidating everything is better either way and multi-part uploads just need to work - not be as efficient as possible. I doubt NixOS infra uploads compressed NARs that are remotely close to the limit, since the uncompressed output size is already limited to 4GB.

@lovesegfault lovesegfault marked this pull request as ready for review October 30, 2025 22:23
@Ericson2314

This comment was marked as off-topic.

@lovesegfault
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 1, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 1, 2025

Walkthrough

This PR implements multipart upload support for S3 binary cache operations to address upload failures with large files. It adds three new configuration settings (multipart-upload, multipart-chunk-size, multipart-threshold) to the S3 binary cache store config, introduces a new MultipartSink type to manage upload lifecycle, adds orchestration and part management methods, includes configuration validation with sizing guards, and re-enables multipart-related tests. Documentation is updated to describe the new multipart configuration options.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Implementation density: The core implementation file introduces substantial new logic including a new MultipartSink struct with lifecycle management (creation, buffering, per-part uploads, completion/abort), orchestration in uploadMultipart, AWS constraint validation (min 5MiB, max 5GiB parts, max 10,000 parts), and error propagation wrapping.
  • Configuration validation: Added validation for multipartChunkSize bounds and consistency warnings between threshold and chunk size settings.
  • Error handling: Exception handling reworked to wrap errors as UploadToS3 with contextual tracing; abortMultipartUpload made noexcept for destructor safety.
  • Multiple concerns: Changes span across new constants, routing logic in upsertFile, part handling, completion signatures, and debugging statements.

Areas requiring extra attention:

  • MultipartSink constructor logic for sizing adjustments and AWS limit enforcement with warning conditions
  • uploadMultipart orchestration and exception handling flow wrapping FileTransferError
  • Configuration validation boundaries and the interaction between multipartThreshold and multipartChunkSize
  • Part number validation and ETag collection for multipart completion
  • Destructor behavior of MultipartSink and abortMultipartUpload resilience

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "feat(libstore): add support for multipart s3 uploads" is specific, concise, and clearly describes the main change. It follows conventional commit formatting and directly summarizes the primary functionality addition evident throughout the changeset—configuration settings, implementation methods, and test re-enablement all focus on enabling multipart S3 uploads. The title accurately reflects the most important change from the developer's perspective without vague terms or unnecessary noise.
Linked Issues Check ✅ Passed The code changes directly address the objective from linked issue #12671, which required enabling S3 uploads for large files to avoid InvalidChunkSizeError. The implementation adds comprehensive multipart upload support through new configuration settings (multipart-upload, multipart-chunk-size, multipart-threshold), a MultipartSink type for managing upload lifecycles, the uploadMultipart orchestration method, per-part handling with uploadPart validation, completion logic, and AWS limits checking. Configuration validation ensures chunk sizes meet AWS requirements, and re-enabled tests verify multipart functionality. These changes fulfill the requirement to implement proper multipart uploads for large files.
Out of Scope Changes Check ✅ Passed All changes in the pull request are in scope and directly related to the objective of adding multipart S3 upload support. The documentation file updates explain the new multipart configuration, the header file adds the required configuration settings, the implementation file provides the core multipart upload logic with proper lifecycle management and error handling, and the test file re-enables multipart-related tests. There are no unrelated refactorings, dependency changes, or feature additions that deviate from the stated objective.
Description Check ✅ Passed The PR description is directly related to the changeset, explaining the motivation to add multipart upload support for handling files larger than 5GiB and referencing the linked issue #12671. It provides context about the scope and implementation approach while following the project's contribution guidelines template. Although the description could include more technical detail, the check explicitly accepts descriptions with varying levels of detail as long as they are related to the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37c1ef5 and a756484.

📒 Files selected for processing (4)
  • doc/manual/rl-next/s3-curl-implementation.md (2 hunks)
  • src/libstore/include/nix/store/s3-binary-cache-store.hh (1 hunks)
  • src/libstore/s3-binary-cache-store.cc (9 hunks)
  • tests/nixos/s3-binary-cache-store.nix (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-28T18:50:29.831Z
Learnt from: xokdvium
Repo: NixOS/nix PR: 14384
File: src/libexpr/include/nix/expr/nixexpr.hh:494-506
Timestamp: 2025-10-28T18:50:29.831Z
Learning: In the NixOS/nix codebase (C++ codebase), when reviewing code that copies simple aggregate types into allocated storage using operations like std::ranges::copy, understand that the maintainers accept relying on C++20 implicit-lifetime type semantics. They consider types like Formal (simple aggregates) to be implicit-lifetime types where copy operations implicitly start object lifetimes, so traditional placement-new or std::uninitialized_copy concerns don't apply. Don't flag this pattern as UB.

Applied to files:

  • src/libstore/s3-binary-cache-store.cc

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +24 to +36
- **`multipart-upload`** (default: `false`): Enable multipart uploads for large
files. When enabled, files exceeding the multipart threshold will be uploaded
in multiple parts.

- **`multipart-threshold`** (default: `100 MiB`): Minimum file size for using
multipart uploads. Files smaller than this will use regular PUT requests.
Only takes effect when `multipart-upload` is enabled.

- **`multipart-chunk-size`** (default: `5 MiB`): Size of each part in multipart
uploads. Must be at least 5 MiB (AWS S3 requirement). Larger chunk sizes
reduce the number of requests but use more memory.

- **`buffer-size`**: Has been replaced by `multipart-chunk-size`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note (not actionable for this PR) that this is the sort of thing that I think nested settings / settings JSON is good for.

…tings

Add three configuration settings to `S3BinaryCacheStoreConfig` to control
multipart upload behavior:

- `bool multipart-upload` (default `false`): Enable/disable multipart uploads
- `uint64_t multipart-chunk-size` (default 5 MiB): Size of each upload part
- `uint64_t multipart-threshold` (default 100 MiB): Minimum file size for multipart

The feature is disabled by default.
Implement `uploadMultipart()`, the main method that orchestrates S3
multipart uploads
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation store Issues and pull requests concerning the Nix store

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Nix binary cache, S3, InvalidChunkSizeError

4 participants