Skip to content

feat(install): add git-native transport for non-GitHub remotes#1249

Merged
dyoshikawa merged 4 commits intodyoshikawa:mainfrom
wfscot:feat/git-native-transport
Mar 5, 2026
Merged

feat(install): add git-native transport for non-GitHub remotes#1249
dyoshikawa merged 4 commits intodyoshikawa:mainfrom
wfscot:feat/git-native-transport

Conversation

@wfscot
Copy link
Copy Markdown
Contributor

@wfscot wfscot commented Mar 2, 2026

Summary

  • Adds a transport: "git" option to source entries so rulesync install can fetch skills from any git remote (Azure DevOps, Bitbucket, file://, etc.) using standard git CLI operations instead of the GitHub REST API
  • New config fields: transport, ref, and path on source entries. When transport is omitted, the existing GitHub transport is used (fully backward-compatible)
  • The git transport uses shallow clone with sparse checkout to efficiently fetch only the skills directory, and reuses all existing lockfile, integrity, filtering, and path traversal logic
  • Updated documentation for declarative-sources.md with git transport examples, new config properties, and authentication notes

Security hardening

All four high-severity items from review have been addressed:

  1. Git binary availability checkcheckGitAvailable() verifies git --version before any operations, with result caching to avoid repeated subprocess calls
  2. Tightened URL validation — SCP-style regex changed from [^/:]+@ to [a-zA-Z0-9_.+-]+@[a-zA-Z0-9.-]+:.+ to prevent matching unintended patterns
  3. Directory walk depth limitwalkDirectory enforces MAX_WALK_DEPTH = 20 to prevent stack overflow on adversarial repo structures
  4. Comprehensive URL validation tests — 13 test cases covering valid schemes (https, ssh, git, file, SCP) and invalid/malicious inputs

Additional medium-severity fixes:

  • -- separator added to sparse-checkout set command (previously only on clone)
  • ref config validation rejects newlines (\n, \r) in addition to leading dashes
  • fetchSkillFiles wraps non-GitClientError exceptions for consistent error handling
  • Fixed duplicate logger.error call in the source fetch catch block

A note on PR size

This PR has 663 additions, which exceeds the 400-line external contributor limit. We considered splitting but concluded that doing so would require shipping the feature without its security hardening, documentation, or test coverage:

Category Files Added lines
Core implementation git-client.ts, sources.ts, config.ts, file.ts 314
Unit tests (26 + 2 tests) git-client.test.ts, sources.test.ts 296
Documentation declarative-sources.md (+ synced copy) 42
Auto-generated config-schema.json, cspell.json 11

The review specifically requested comprehensive test coverage and security fixes. Splitting into two PRs would mean merging the transport without its security hardening or tests first, which seems like the wrong trade-off. Happy to discuss alternative splits if needed.

Test plan

  • pnpm cicheck passes (format, lint, types, 4177 tests, spelling, secrets)
  • 26 unit tests for git-client: URL validation, git availability, ref resolution, file fetching, depth limits, error wrapping, sparse-checkout separators
  • 22 unit tests for sources (2 new for git transport): transport integration, explicit ref/path, lockfile behavior
  • Manual test with file:// source pointing at a local repo — 5 skills installed correctly
  • Existing GitHub transport continues to work unchanged

@wfscot wfscot marked this pull request as draft March 2, 2026 23:59
@wfscot wfscot force-pushed the feat/git-native-transport branch from 0442d74 to b375aed Compare March 3, 2026 00:12
@dyoshikawa-claw

This comment has been minimized.

@github-actions

This comment has been minimized.

@wfscot wfscot force-pushed the feat/git-native-transport branch from b375aed to 097d6a6 Compare March 3, 2026 04:49
Add a git-native transport option for the `install` command that uses
standard git CLI operations (ls-remote, clone, sparse-checkout) instead
of the GitHub REST API. This enables declarative skill sources from any
git remote including Azure DevOps, Bitbucket, and local file:// clones.

Configuration:
  { "source": "https://dev.azure.com/org/project/_git/repo",
    "transport": "git", "ref": "main", "path": "skills" }

Security hardening (per review feedback):
- checkGitAvailable() verifies git binary before operations
- Tightened SCP-style URL validation regex
- Depth limit (20) on recursive directory walking
- -- separators before all positional git arguments
- ref validation rejects leading dashes and newlines
- path validation rejects directory traversal (..)
- Error wrapping with GitClientError for consistent handling
@wfscot wfscot force-pushed the feat/git-native-transport branch from 097d6a6 to 270d1df Compare March 3, 2026 04:57
@wfscot wfscot marked this pull request as ready for review March 3, 2026 05:01
@wfscot
Copy link
Copy Markdown
Contributor Author

wfscot commented Mar 3, 2026

@dyoshikawa I originally left a lot out to hit that 400 line limit. After the initial feedback, I decided to just do it the right way. I'm hoping that you can override the limit.

If you'd rather I split it up, I can. It's just going to mean cutting corners on the first PR as described above. Lmk.

Thanks!

@dyoshikawa-claw

This comment has been minimized.

@github-actions

This comment has been minimized.

@wfscot
Copy link
Copy Markdown
Contributor Author

wfscot commented Mar 3, 2026

Sorry @dyoshikawa , I merged the upstream changes before I saw the review. I'll stop messing with this. Lmk if you need anything.

Thanks again! Awesome tool!

@dyoshikawa-claw

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 3, 2026

Unable to determine the requested action from the triggering comment. Please use /opencode review for a code review or /opencode fix to request fixes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 3, 2026

Posted clarification comment on PR #1249 requesting clear instruction (/opencode review or /opencode fix).

github run

Copy link
Copy Markdown
Owner

@dyoshikawa dyoshikawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this excellent PR! The architecture is well thought out — the separation of git-client.ts as a transport layer, the reuse of existing path traversal utilities, and the consistent use of execFile with -- separators all demonstrate strong security awareness. The test coverage is also thorough.

I have some suggestions below, organized by priority.


Security Issues

[HIGH] Null bytes not rejected in ref validation (config.ts)

The ref validation rejects leading dashes and newlines, which is great. However, null bytes (\0) are not rejected. When passed to the git binary (a C program), a null byte causes C-string truncation, meaning the actual ref argument could differ from what was validated.

Suggested fix:

ref: optional(
  z.string().check(
    refine((v) => !v.startsWith("-"), 'ref must not start with "-"'),
    refine((v) => !/[\x00-\x1f\x7f]/.test(v), "ref must not contain control characters"),
  ),
),

Rejecting all control characters (\x00-\x1f, \x7f) covers null bytes, newlines, and other potentially dangerous characters in one check. This also lets you drop the separate newline check.

[MEDIUM] path config accepts absolute paths and null bytes (config.ts)

The path field only rejects .., but an absolute path like /etc/passwd would cause join(tmpDir, skillsPath) to resolve to that absolute path, ignoring tmpDir entirely.

Suggested fix:

path: optional(
  z.string().check(
    refine((v) => !v.includes(".."), 'path must not contain ".."'),
    refine((v) => !path.isAbsolute(v), "path must not be absolute"),
    refine((v) => !/[\x00-\x1f\x7f]/.test(v), "path must not contain control characters"),
  ),
),

[MEDIUM] git:// and http:// protocols allowed without warning (git-client.ts)

git:// transmits data unencrypted and has been disabled by default in git since 2.38.1 (CVE-2022-39253). http:// has the same MITM concern.

Suggestion: At minimum, emit a logger.warn when these schemes are detected. Alternatively, consider rejecting git:// by default.

[LOW] Symlink following in walkDirectory (git-client.ts)

directoryExists uses stat(), which follows symlinks. A malicious repository could include a symlink pointing outside the clone directory (e.g., to /etc/ or ~/.ssh/), causing readFileContent to expose files from the host filesystem.

Suggested fix: Use lstat() instead, and skip symlinks:

const lstat = promisify(fs.lstat);
// in walkDirectory:
const stats = await lstat(fullPath);
if (stats.isSymbolicLink()) continue;
if (stats.isDirectory()) {
  results.push(...(await walkDirectory(fullPath, baseDir, depth + 1)));
} else {
  // read file...
}

Code Quality Issues

[MEDIUM] GitClientError.cause shadows ES2022 Error.cause (git-client.ts)

The public readonly cause declaration shadows the built-in Error.cause property. This can cause issues with tooling that relies on standard error chaining (e.g., structured error serialization).

Suggested fix:

export class GitClientError extends Error {
  constructor(message: string, cause?: unknown) {
    super(message, { cause });
    this.name = "GitClientError";
  }
}

[MEDIUM] Significant code duplication between fetchSource and fetchSourceViaGit (sources.ts)

These two functions share a lot of nearly identical logic: lockfile lookup, SHA comparison, skill name filtering, path traversal checks, local skill precedence, integrity computation, lockfile update, and final logging. If any of this shared logic is updated in one function, the other must be updated in lockstep.

Suggestion: Extract the shared orchestration into common helpers, keeping only the transport-specific part (GitHub API calls vs. git clone + file walk) as the divergent code. A strategy/adapter pattern could work well here — a common orchestrator delegates the "fetch files" step to a transport-specific implementation.

This could be addressed in a follow-up PR if you prefer to keep this one focused.

[LOW] Missing log messages in fetchSourceViaGit (sources.ts)

Several skip conditions in fetchSourceViaGit are silent, while the equivalent conditions in fetchSource log warnings:

  1. Path traversal skipfetchSource logs logger.warn("Skipping skill with invalid name..."), but fetchSourceViaGit silently continues.
  2. Local skill precedence / already-fetched skipfetchSource has logger.debug/logger.warn, but fetchSourceViaGit silently continues.

Suggestion: Add the same log messages for consistency and debuggability.

[LOW] getFileSize has no error handling (utils/file.ts)

Every other I/O function in file.ts wraps its calls in try-catch and returns a safe default. getFileSize is the only one that lets raw ENOENT/EACCES errors propagate.

Suggestion: Either wrap in try-catch with a meaningful error message, or document that callers are expected to handle filesystem errors.

[LOW] walkDirectory returns [] when max depth is exceeded (git-client.ts)

Hitting MAX_WALK_DEPTH = 20 is almost certainly adversarial. Returning an empty array silently makes this indistinguishable from an empty directory.

Suggestion: Consider throwing a GitClientError instead so the caller can decide whether to abort the entire operation.


Nits

  • fetchSkillFiles accepts a generic ref: string but --branch only works with branch/tag names, not commit SHAs. A JSDoc comment clarifying this would help.
  • The SCP-style URL regex .+ after the colon is quite permissive. Consider restricting to [a-zA-Z0-9_.+/~-]+ for defense in depth.

Overall this is a solid feature addition with good security foundations. The main items I'd suggest addressing before merge are the null byte / absolute path validation gaps and the symlink issue. The code duplication could be tackled in a follow-up. Nice work!

@wfscot
Copy link
Copy Markdown
Contributor Author

wfscot commented Mar 4, 2026

@dyoshikawa lmk if you need anything else. I'm happy to follow this up with the other feedback. Easy enough.

Copy link
Copy Markdown
Owner

@dyoshikawa dyoshikawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

Overall this is a well-structured PR with solid security hardening. The use of execFile (not exec), -- separators, Zod schema validation, symlink detection, and path traversal protection are all commendable. Below are the findings at medium severity and above.


High: Lock file requestedRef bypasses validation

File: src/lib/sources.ts (fetchSourceViaGit)

When locked && !updateSources, requestedRef is read directly from locked.requestedRef and later passed to fetchSkillFilesgit clone --branch. However, LockedSourceSchema does not enforce the same constraints as SourceEntrySchema (no leading -, no control characters). A tampered lock file (e.g., committed by a malicious contributor) could inject unexpected values.

Recommendation: Apply the same ref validation (no leading -, no control characters) to requestedRef values read from the lock file before passing them to git commands.


High: SHA mismatch when requestedRef is undefined from lock

File: src/lib/sources.ts (fetchSourceViaGit)

When locked && !updateSources and locked.requestedRef is undefined, the code falls through the cache-miss path and calls resolveDefaultRef(url) to get a new branch name. However, resolvedSha still holds the stale value from the lock file. If the default branch has advanced, the fetched content will not match the recorded SHA, and the lock entry will be written with an outdated resolvedRef.


High: Frozen mode not checked for git transport

File: src/lib/sources.ts (fetchSourceViaGit)

fetchSourceViaGit does not check options.frozen. If locked skills are missing from disk, it will make network calls (resolveDefaultRef/resolveRefToSha) even in frozen mode. The existing fetchSource function respects frozen mode — the git transport should do the same.


Medium: Significant code duplication between fetchSourceViaGit and fetchSource

Files: src/lib/sources.ts

fetchSourceViaGit (144 lines) duplicates substantial logic from fetchSource (232 lines): lock resolution, cache-hit checks, skill name validation, integrity verification, lock updates, and directory cleanup. This makes it easy for future changes to diverge between the two paths (as evidenced by the frozen-mode gap above).

Recommendation: Consider extracting the shared orchestration logic into a common function and using a strategy/adapter pattern for transport-specific operations (ref resolution, file fetching). This would reduce maintenance burden and prevent the two paths from drifting apart.


Medium: No timeout on git CLI operations

File: src/lib/git-client.ts

All execFileAsync calls (git clone, git ls-remote, git sparse-checkout set, git checkout) have no timeout. A malicious or unresponsive remote could hang the process indefinitely.

Recommendation: Add a timeout option, e.g.:

await execFileAsync("git", [...args], { timeout: 60_000 });

Medium: No control character check on the source URL body

File: src/lib/git-client.ts (validateGitUrl)

validateGitUrl only checks the URL scheme prefix. Control characters within the URL body (e.g., terminal escape sequences) are not rejected. While execFile prevents shell injection, these characters could cause terminal injection via log output.

Recommendation: Add a control character check in validateGitUrl:

if (hasControlCharacters(url)) {
  throw new GitClientError("git URL must not contain control characters");
}

Medium: Error log ordering change may confuse users

File: src/lib/sources.ts (catch block in resolveAndFetchSources)

The order of logger.error and logGitHubAuthHints was swapped. Now the auth hints appear before the error message, which may confuse users trying to understand what went wrong. The original order (error first, then hints) was more intuitive.


Thanks for the thorough security hardening work — the four high-severity items from the previous review are well addressed. The remaining findings above are mostly about consistency between the new git transport path and the existing GitHub transport path.

@wfscot
Copy link
Copy Markdown
Contributor Author

wfscot commented Mar 5, 2026

@dyoshikawa you should be good to go. Thanks!

Btw, I've noticed that if I rename a skill in the remote and do install --update, it leaves the old skill alongside the new. I'm fairly sure this is not just for the new git transport.

Is this intentional? It seems that we should retain the old lock info in order to detect this and delete the old one, yeah? I'm happy to do that in another PR. I just wanted to check if it was somehow intentional.

@cm-dyoshikawa
Copy link
Copy Markdown
Collaborator

@wfscot Thank you for the work so far. I hope it is handled if possible. But, I guess it is low priority.

@dyoshikawa dyoshikawa merged commit 5b0937a into dyoshikawa:main Mar 5, 2026
6 checks passed
@cm-dyoshikawa
Copy link
Copy Markdown
Collaborator

@wfscot Thank you!

@wfscot
Copy link
Copy Markdown
Contributor Author

wfscot commented Mar 5, 2026

Absolutely! Thank you! I'll work on the follow-up stuff tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants