feat(install): add git-native transport for non-GitHub remotes#1249
feat(install): add git-native transport for non-GitHub remotes#1249dyoshikawa merged 4 commits intodyoshikawa:mainfrom
Conversation
0442d74 to
b375aed
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
b375aed to
097d6a6
Compare
Add a git-native transport option for the `install` command that uses
standard git CLI operations (ls-remote, clone, sparse-checkout) instead
of the GitHub REST API. This enables declarative skill sources from any
git remote including Azure DevOps, Bitbucket, and local file:// clones.
Configuration:
{ "source": "https://dev.azure.com/org/project/_git/repo",
"transport": "git", "ref": "main", "path": "skills" }
Security hardening (per review feedback):
- checkGitAvailable() verifies git binary before operations
- Tightened SCP-style URL validation regex
- Depth limit (20) on recursive directory walking
- -- separators before all positional git arguments
- ref validation rejects leading dashes and newlines
- path validation rejects directory traversal (..)
- Error wrapping with GitClientError for consistent handling
097d6a6 to
270d1df
Compare
|
@dyoshikawa I originally left a lot out to hit that 400 line limit. After the initial feedback, I decided to just do it the right way. I'm hoping that you can override the limit. If you'd rather I split it up, I can. It's just going to mean cutting corners on the first PR as described above. Lmk. Thanks! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Sorry @dyoshikawa , I merged the upstream changes before I saw the review. I'll stop messing with this. Lmk if you need anything. Thanks again! Awesome tool! |
This comment has been minimized.
This comment has been minimized.
|
Unable to determine the requested action from the triggering comment. Please use |
|
Posted clarification comment on PR #1249 requesting clear instruction ( |
dyoshikawa
left a comment
There was a problem hiding this comment.
Thank you for this excellent PR! The architecture is well thought out — the separation of git-client.ts as a transport layer, the reuse of existing path traversal utilities, and the consistent use of execFile with -- separators all demonstrate strong security awareness. The test coverage is also thorough.
I have some suggestions below, organized by priority.
Security Issues
[HIGH] Null bytes not rejected in ref validation (config.ts)
The ref validation rejects leading dashes and newlines, which is great. However, null bytes (\0) are not rejected. When passed to the git binary (a C program), a null byte causes C-string truncation, meaning the actual ref argument could differ from what was validated.
Suggested fix:
ref: optional(
z.string().check(
refine((v) => !v.startsWith("-"), 'ref must not start with "-"'),
refine((v) => !/[\x00-\x1f\x7f]/.test(v), "ref must not contain control characters"),
),
),Rejecting all control characters (\x00-\x1f, \x7f) covers null bytes, newlines, and other potentially dangerous characters in one check. This also lets you drop the separate newline check.
[MEDIUM] path config accepts absolute paths and null bytes (config.ts)
The path field only rejects .., but an absolute path like /etc/passwd would cause join(tmpDir, skillsPath) to resolve to that absolute path, ignoring tmpDir entirely.
Suggested fix:
path: optional(
z.string().check(
refine((v) => !v.includes(".."), 'path must not contain ".."'),
refine((v) => !path.isAbsolute(v), "path must not be absolute"),
refine((v) => !/[\x00-\x1f\x7f]/.test(v), "path must not contain control characters"),
),
),[MEDIUM] git:// and http:// protocols allowed without warning (git-client.ts)
git:// transmits data unencrypted and has been disabled by default in git since 2.38.1 (CVE-2022-39253). http:// has the same MITM concern.
Suggestion: At minimum, emit a logger.warn when these schemes are detected. Alternatively, consider rejecting git:// by default.
[LOW] Symlink following in walkDirectory (git-client.ts)
directoryExists uses stat(), which follows symlinks. A malicious repository could include a symlink pointing outside the clone directory (e.g., to /etc/ or ~/.ssh/), causing readFileContent to expose files from the host filesystem.
Suggested fix: Use lstat() instead, and skip symlinks:
const lstat = promisify(fs.lstat);
// in walkDirectory:
const stats = await lstat(fullPath);
if (stats.isSymbolicLink()) continue;
if (stats.isDirectory()) {
results.push(...(await walkDirectory(fullPath, baseDir, depth + 1)));
} else {
// read file...
}Code Quality Issues
[MEDIUM] GitClientError.cause shadows ES2022 Error.cause (git-client.ts)
The public readonly cause declaration shadows the built-in Error.cause property. This can cause issues with tooling that relies on standard error chaining (e.g., structured error serialization).
Suggested fix:
export class GitClientError extends Error {
constructor(message: string, cause?: unknown) {
super(message, { cause });
this.name = "GitClientError";
}
}[MEDIUM] Significant code duplication between fetchSource and fetchSourceViaGit (sources.ts)
These two functions share a lot of nearly identical logic: lockfile lookup, SHA comparison, skill name filtering, path traversal checks, local skill precedence, integrity computation, lockfile update, and final logging. If any of this shared logic is updated in one function, the other must be updated in lockstep.
Suggestion: Extract the shared orchestration into common helpers, keeping only the transport-specific part (GitHub API calls vs. git clone + file walk) as the divergent code. A strategy/adapter pattern could work well here — a common orchestrator delegates the "fetch files" step to a transport-specific implementation.
This could be addressed in a follow-up PR if you prefer to keep this one focused.
[LOW] Missing log messages in fetchSourceViaGit (sources.ts)
Several skip conditions in fetchSourceViaGit are silent, while the equivalent conditions in fetchSource log warnings:
- Path traversal skip —
fetchSourcelogslogger.warn("Skipping skill with invalid name..."), butfetchSourceViaGitsilently continues. - Local skill precedence / already-fetched skip —
fetchSourcehaslogger.debug/logger.warn, butfetchSourceViaGitsilently continues.
Suggestion: Add the same log messages for consistency and debuggability.
[LOW] getFileSize has no error handling (utils/file.ts)
Every other I/O function in file.ts wraps its calls in try-catch and returns a safe default. getFileSize is the only one that lets raw ENOENT/EACCES errors propagate.
Suggestion: Either wrap in try-catch with a meaningful error message, or document that callers are expected to handle filesystem errors.
[LOW] walkDirectory returns [] when max depth is exceeded (git-client.ts)
Hitting MAX_WALK_DEPTH = 20 is almost certainly adversarial. Returning an empty array silently makes this indistinguishable from an empty directory.
Suggestion: Consider throwing a GitClientError instead so the caller can decide whether to abort the entire operation.
Nits
fetchSkillFilesaccepts a genericref: stringbut--branchonly works with branch/tag names, not commit SHAs. A JSDoc comment clarifying this would help.- The SCP-style URL regex
.+after the colon is quite permissive. Consider restricting to[a-zA-Z0-9_.+/~-]+for defense in depth.
Overall this is a solid feature addition with good security foundations. The main items I'd suggest addressing before merge are the null byte / absolute path validation gaps and the symlink issue. The code duplication could be tackled in a follow-up. Nice work!
|
@dyoshikawa lmk if you need anything else. I'm happy to follow this up with the other feedback. Easy enough. |
dyoshikawa
left a comment
There was a problem hiding this comment.
Code Review Summary
Overall this is a well-structured PR with solid security hardening. The use of execFile (not exec), -- separators, Zod schema validation, symlink detection, and path traversal protection are all commendable. Below are the findings at medium severity and above.
High: Lock file requestedRef bypasses validation
File: src/lib/sources.ts (fetchSourceViaGit)
When locked && !updateSources, requestedRef is read directly from locked.requestedRef and later passed to fetchSkillFiles → git clone --branch. However, LockedSourceSchema does not enforce the same constraints as SourceEntrySchema (no leading -, no control characters). A tampered lock file (e.g., committed by a malicious contributor) could inject unexpected values.
Recommendation: Apply the same ref validation (no leading -, no control characters) to requestedRef values read from the lock file before passing them to git commands.
High: SHA mismatch when requestedRef is undefined from lock
File: src/lib/sources.ts (fetchSourceViaGit)
When locked && !updateSources and locked.requestedRef is undefined, the code falls through the cache-miss path and calls resolveDefaultRef(url) to get a new branch name. However, resolvedSha still holds the stale value from the lock file. If the default branch has advanced, the fetched content will not match the recorded SHA, and the lock entry will be written with an outdated resolvedRef.
High: Frozen mode not checked for git transport
File: src/lib/sources.ts (fetchSourceViaGit)
fetchSourceViaGit does not check options.frozen. If locked skills are missing from disk, it will make network calls (resolveDefaultRef/resolveRefToSha) even in frozen mode. The existing fetchSource function respects frozen mode — the git transport should do the same.
Medium: Significant code duplication between fetchSourceViaGit and fetchSource
Files: src/lib/sources.ts
fetchSourceViaGit (144 lines) duplicates substantial logic from fetchSource (232 lines): lock resolution, cache-hit checks, skill name validation, integrity verification, lock updates, and directory cleanup. This makes it easy for future changes to diverge between the two paths (as evidenced by the frozen-mode gap above).
Recommendation: Consider extracting the shared orchestration logic into a common function and using a strategy/adapter pattern for transport-specific operations (ref resolution, file fetching). This would reduce maintenance burden and prevent the two paths from drifting apart.
Medium: No timeout on git CLI operations
File: src/lib/git-client.ts
All execFileAsync calls (git clone, git ls-remote, git sparse-checkout set, git checkout) have no timeout. A malicious or unresponsive remote could hang the process indefinitely.
Recommendation: Add a timeout option, e.g.:
await execFileAsync("git", [...args], { timeout: 60_000 });Medium: No control character check on the source URL body
File: src/lib/git-client.ts (validateGitUrl)
validateGitUrl only checks the URL scheme prefix. Control characters within the URL body (e.g., terminal escape sequences) are not rejected. While execFile prevents shell injection, these characters could cause terminal injection via log output.
Recommendation: Add a control character check in validateGitUrl:
if (hasControlCharacters(url)) {
throw new GitClientError("git URL must not contain control characters");
}Medium: Error log ordering change may confuse users
File: src/lib/sources.ts (catch block in resolveAndFetchSources)
The order of logger.error and logGitHubAuthHints was swapped. Now the auth hints appear before the error message, which may confuse users trying to understand what went wrong. The original order (error first, then hints) was more intuitive.
Thanks for the thorough security hardening work — the four high-severity items from the previous review are well addressed. The remaining findings above are mostly about consistency between the new git transport path and the existing GitHub transport path.
|
@dyoshikawa you should be good to go. Thanks! Btw, I've noticed that if I rename a skill in the remote and do Is this intentional? It seems that we should retain the old lock info in order to detect this and delete the old one, yeah? I'm happy to do that in another PR. I just wanted to check if it was somehow intentional. |
|
@wfscot Thank you for the work so far. I hope it is handled if possible. But, I guess it is low priority. |
|
@wfscot Thank you! |
|
Absolutely! Thank you! I'll work on the follow-up stuff tomorrow. |
Summary
transport: "git"option to source entries sorulesync installcan fetch skills from any git remote (Azure DevOps, Bitbucket,file://, etc.) using standard git CLI operations instead of the GitHub REST APItransport,ref, andpathon source entries. Whentransportis omitted, the existing GitHub transport is used (fully backward-compatible)declarative-sources.mdwith git transport examples, new config properties, and authentication notesSecurity hardening
All four high-severity items from review have been addressed:
checkGitAvailable()verifiesgit --versionbefore any operations, with result caching to avoid repeated subprocess calls[^/:]+@to[a-zA-Z0-9_.+-]+@[a-zA-Z0-9.-]+:.+to prevent matching unintended patternswalkDirectoryenforcesMAX_WALK_DEPTH = 20to prevent stack overflow on adversarial repo structuresAdditional medium-severity fixes:
--separator added tosparse-checkout setcommand (previously only on clone)refconfig validation rejects newlines (\n,\r) in addition to leading dashesfetchSkillFileswraps non-GitClientErrorexceptions for consistent error handlinglogger.errorcall in the source fetch catch blockA note on PR size
This PR has 663 additions, which exceeds the 400-line external contributor limit. We considered splitting but concluded that doing so would require shipping the feature without its security hardening, documentation, or test coverage:
git-client.ts,sources.ts,config.ts,file.tsgit-client.test.ts,sources.test.tsdeclarative-sources.md(+ synced copy)config-schema.json,cspell.jsonThe review specifically requested comprehensive test coverage and security fixes. Splitting into two PRs would mean merging the transport without its security hardening or tests first, which seems like the wrong trade-off. Happy to discuss alternative splits if needed.
Test plan
pnpm cicheckpasses (format, lint, types, 4177 tests, spelling, secrets)file://source pointing at a local repo — 5 skills installed correctly