Skip to content

Conversation

@siomporas
Copy link

Which issue does this PR close?

Closes #6831.

Rationale for this change

OpenDAL has support for specific git service providers like HuggingFace, but not a generic git provider with LFS support. These changes provide generic git + LFS file streaming support using an OpenDAL service git.

What changes are included in this PR?

A new service git, documentation, and crate features for the new service.

Are there any user-facing changes?

A new service back end!

NOTE - I tested these changes pretty comprehensively on LFS repositories in my private Gitlab instance as well as on HuggingFace, both with and without credentials on private and public repositories, and I tested non-LFS repos as well including on Github.

I created a companion demo project here that bootstraps this particular version of OpenDAL using a git submodule, and provides a simple CLI tool to clone git repository states including LFS to the local file system to demonstrate the new service.

@siomporas siomporas requested a review from Xuanwo as a code owner November 29, 2025 18:57
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. releases-note/feat The PR implements a new feature or has a title that begins with "feat" labels Nov 29, 2025
// Use full clone instead of shallow to support arbitrary commit SHAs
// Shallow clone only gets the tip of the default branch
let (repo, _) = prepare
.fetch_only(gix::progress::Discard, &gix::interrupt::IS_INTERRUPTED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to full clone the repo, what's the value this service can bring to our users?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clone operation is for the underlying git repository, which is done in a temp folder. This is necessary because of how git's internal object database and revisions are stored and served in packs. Dealing with individual files requires access to the packs, which is how things like GitHub serve individual files - this service is meant to use the least common denominator, which is git's protocol, so we need to download the packs for an oid to get the files which is what gix provides.

The value isn't in providing the contents of the core git repo (these contents are loaded into memory when the repo is cloned) - it is offering streams for the LFS objects which are generally huge, and exist completely outside and on top of the git repository and are directly accessible over http.

Hopefully that makes sense. I suggest you clone down the example project I linked in the description and clone an AI model off of huggingface while watching resource utilisation for the process if that still doesn't make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

releases-note/feat The PR implements a new feature or has a title that begins with "feat" size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

new feature: Generic git service with LFS support

2 participants