Skip to content

feat: support GitHub tree/<ref> URL for code repository import#400

Merged
MaojiaSheng merged 1 commit intovolcengine:mainfrom
yangxinxin-7:feature/repo-commit-import
Mar 3, 2026
Merged

feat: support GitHub tree/<ref> URL for code repository import#400
MaojiaSheng merged 1 commit intovolcengine:mainfrom
yangxinxin-7:feature/repo-commit-import

Conversation

@yangxinxin-7
Copy link
Collaborator

Background

Previously, only plain repository URLs (https://github.com/user/repo) were recognized as code repositories. URLs containing a specific branch or commit ref (e.g.,
https://github.com/user/repo/tree/main) were routed to HTMLParser and treated as regular web pages, making it impossible to import a specific version of a repository.

Changes

openviking/utils/code_hosting_utils.py

Extended is_git_repo_url to recognize owner/repo/tree/ URLs in addition to plain owner/repo URLs. Only exactly 4-segment paths with tree at position 2 are matched,
avoiding false positives like /blob/, /issues/, or deeper sub-paths.

openviking/parse/parsers/code/code.py

  • _parse_ref_from_path: When a tree/ URL is parsed, the ref is now checked against a hex heuristic (_looks_like_sha). If it looks like a commit SHA (7–40 hex
    characters), it is placed in the commit field; otherwise it goes to branch. This ensures non-GitHub hosts use the correct git operation (--branch for branches, fetch +
    checkout for SHAs).
  • _github_zip_download: Fixed ZIP URL construction — removed the refs/heads/ prefix so the URL works for branch names, tags, and commit SHAs alike (archive/{ref}.zip).
  • parse(): GitHub now always uses the ZIP API regardless of whether a branch or commit is specified, passing branch or commit as the ref.

Usage

branch

client.add_resource("https://github.com/user/repo/tree/main")

commit SHA (short or full)

client.add_resource("https://github.com/user/repo/tree/ab849f44f252e65b2ea3106322c235d8c2f349ad")
client.add_resource("https://github.com/user/repo/tree/ab849f4")

@MaojiaSheng MaojiaSheng merged commit 755efa7 into volcengine:main Mar 3, 2026
4 of 5 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants