Skip to content

Conversation

@kalbasit
Copy link
Owner

Implementation of the Binary Cache Index Protocol as specified in RFC 0195.
This package provides a standalone, reusable library for interacting with
Nix binary cache indexes.

Key features implemented:

  • Base32 hash parsing and formatting with Big-Endian 160-bit interpretation.
  • Manifest serialization and deserialization (JSON).
  • Journal reading and writing (line-delimited +hash/-hash).
  • Golomb-Rice encoding and decoding with BitWriter/BitReader, supporting
    arbitrary-precision integers (big.Int) for large deltas.
  • Shard File Format (Header, Sparse Index, Encoded Data) with XXH64 checksums.
  • Client Query Algorithm implementing Journal and Shard lookup logic.

The package is isolated and has no dependencies on other ncps internal packages,
enabling its future extraction into a standalone library.

Verified with comprehensive unit and end-to-end tests.

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request go Pull requests that update go code labels Jan 15, 2026
@dosubot
Copy link

dosubot bot commented Jan 15, 2026

Related Documentation

Checked 1 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @kalbasit, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a foundational nixcacheindex package, designed to fully support the Nix Binary Cache Index Protocol (RFC 0195). It provides all the necessary primitives and a client for interacting with Nix binary cache indexes, enabling efficient storage, retrieval, and querying of package hashes. The implementation focuses on correctness, performance, and reusability, laying the groundwork for robust binary cache management.

Highlights

  • RFC 0195 Implementation: A new nixcacheindex package has been introduced, providing a complete implementation of the Nix Binary Cache Index Protocol as specified in RFC 0195.
  • Core Components: The package includes robust implementations for Base32 hash parsing and formatting, Manifest serialization/deserialization, Journal reading/writing, and the Shard File Format (Header, Sparse Index, Encoded Data) with XXH64 checksums.
  • Efficient Encoding: Golomb-Rice encoding and decoding, including support for arbitrary-precision integers (math/big.Int), is provided for efficient storage and retrieval of hash deltas within shards.
  • Client Query Algorithm: A client query algorithm is implemented to efficiently check for the presence of a hash by first consulting the journal for recent mutations and then querying the relevant shards.
  • Isolation and Test Coverage: The new package is designed to be isolated with no dependencies on other internal packages, facilitating its future extraction. It is verified with comprehensive unit and end-to-end tests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a comprehensive implementation of the Nix binary cache index protocol as specified in RFC 0195. The code is well-structured into a new nixcacheindex package, covering base32 encoding, manifest handling, journal operations, Golomb-Rice encoding, and the shard file format. The client query logic correctly implements the two-layer lookup (journal then shards). The test coverage is good, including end-to-end tests for the client.

I've identified a few areas for improvement:

  • A couple of minor refactoring opportunities in the shard writing logic to improve code clarity and maintainability.
  • A misleading comment in the client implementation.
  • A significant gap in test coverage for the big.Int variant of the Golomb-Rice encoding, which is critical for correctness.

Additionally, the current client implementation reads entire shards into memory, which could be a performance bottleneck for very large shards. This is a reasonable trade-off for an initial implementation but should be considered for future optimization (e.g., using HTTP Range requests).

Overall, this is a high-quality contribution. Addressing the feedback will further improve its robustness and maintainability.

@kalbasit
Copy link
Owner Author

/gemini review

@kalbasit
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive implementation of the Nix binary cache index protocol (RFC 0195) as a new, self-contained Go package nixcacheindex. The implementation is thorough, covering base32 hash handling, manifest and journal processing, Golomb-Rice encoding, the shard file format, and the client query algorithm. The code is well-structured and accompanied by extensive tests, which is excellent.

My review focuses on improving the robustness and correctness of the new library. I've identified a few areas for improvement, including a potential data loss issue in the Golomb encoder due to unflushed buffers, missing validation for encoding parameters that could lead to panics, and opportunities to improve resource management. The suggested changes aim to make the library safer and more reliable for future use.

@kalbasit
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive implementation of the Nix binary cache index protocol (RFC 0195) in a new, standalone Go package. The implementation is well-structured, follows the RFC closely, and includes good test coverage for many of the new components like base32 encoding, manifest handling, and journaling.

My review has identified the following issues:

  • Critical: The client query logic for shards does not decompress fetched shard files, which will cause queries to fail against a real-world compressed cache index.
  • Medium: The tests for the Golomb-Rice encoding logic are missing coverage for the big.Int variants, which are crucial for handling large hash deltas.

Detailed comments and code suggestions have been provided to address these points. Once the decompression logic is fixed, this will be a solid implementation of the RFC.

@kalbasit kalbasit force-pushed the nixcacheindex branch 2 times, most recently from 9a7194d to 11f9c88 Compare January 16, 2026 10:42
@kalbasit
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive implementation of the RFC 0195 binary cache index protocol. The changes include robust handling of Base32 hash parsing and formatting, manifest serialization/deserialization, journal reading/writing, Golomb-Rice encoding/decoding with big.Int support, and the Shard File Format with XXH64 checksums. The client query algorithm effectively integrates journal and shard lookup logic, including handling missing shards and epoch retries. The code is well-structured, follows Go conventions, and is accompanied by thorough unit and end-to-end tests, demonstrating a high level of attention to detail and correctness. Overall, this is a solid and well-executed feature implementation.

@kalbasit kalbasit changed the base branch from main to graphite-base/552 January 17, 2026 01:34
@kalbasit kalbasit changed the base branch from graphite-base/552 to 01-16-pkg_golomb January 17, 2026 01:34
@kalbasit kalbasit changed the base branch from 01-16-pkg_golomb to graphite-base/552 January 17, 2026 01:44
@kalbasit kalbasit changed the base branch from graphite-base/552 to main January 17, 2026 01:45
Copy link
Owner Author

kalbasit commented Jan 17, 2026

Merge activity

  • Jan 17, 1:59 AM UTC: Graphite couldn't merge this PR because it had merge conflicts.
  • Jan 17, 2:00 AM UTC: Graphite couldn't merge this PR because it had merge conflicts.
  • Jan 17, 2:01 AM UTC: Graphite couldn't merge this PR because it had merge conflicts.

@kalbasit kalbasit force-pushed the nixcacheindex branch 2 times, most recently from c5a7889 to 9622a58 Compare January 17, 2026 21:06
@kalbasit kalbasit marked this pull request as draft January 19, 2026 07:58
// For now, read all.
var reader io.Reader = rc
if strings.HasSuffix(shardPath, ".zst") {
zstdReader, err := zstd.NewReader(rc)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice to have: I did some benchmarking in the past and noticed that creating a new instance each time, creates significant allocations. In niks3 I am using a pool allocator for zstd for this reasons, which saved me a few gigabytes of allocations for uploads.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I use pools in ncps as well. This pr is mostly antigravity coding the rfc. I haven't reviewed/changed it yet.


data, err := io.ReadAll(reader)
if err != nil {
return DefiniteMiss, err
Copy link

@Mic92 Mic92 Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not super familiar with the error handling in this project, but I would wrap the error here to add more context, when we have a connection/filesystem error.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably some other places where I would add more error context for production troubleshooting.


var compressedShardBuf bytes.Buffer

enc, err := zstd.NewWriter(&compressedShardBuf)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I think a pool allocator would make sense.

@Mic92
Copy link

Mic92 commented Jan 19, 2026

Will do a more in-depth review later. Just quickly skimmed over the code.

@kalbasit
Copy link
Owner Author

Will do a more in-depth review later. Just quickly skimmed over the code.

Thank you! I will ping you to review once the RFC is approved (or at least settled on a design) and I update these prs with the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request go Pull requests that update go code size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants