diff --git a/README.md b/README.md index 4752ee5..159c647 100644 --- a/README.md +++ b/README.md @@ -151,6 +151,10 @@ just open purpose-gate minimal tool-counter-widget ``` pi-vs-cc/ ├── extensions/ # Pi extension source files (.ts) — one file per extension +│ └── utils/ # Shared utilities used by multiple extensions +│ └── agent-loader.ts # Validated agent .md file loader (SEC-001) +├── tests/ # Test suite +│ └── agent-loader.test.ts # 42 tests for agent validation logic ├── specs/ # Feature specifications for extensions ├── .pi/ │ ├── agent-sessions/ # Ephemeral session files (gitignored) @@ -165,6 +169,7 @@ pi-vs-cc/ │ └── settings.json # Pi workspace settings ├── justfile # just task definitions ├── CLAUDE.md # Conventions and tooling reference (for agents) +├── SECURITY_AUDIT.md # Security audit — 15 issues tracked with fix plans ├── THEME.md # Color token conventions for extension authors └── TOOLS.md # Built-in tool function signatures available in extensions ``` @@ -203,6 +208,22 @@ The `damage-control` extension provides real-time security hooks to prevent cata - **Read-Only Paths**: Allows reading but blocks modifying system files or lockfiles (`package-lock.json`, `/etc/`). - **No-Delete Paths**: Allows modifying but prevents deleting critical project configuration (`.git/`, `Dockerfile`, `README.md`). +### Agent Definition Validation + +Extensions that spawn subprocesses (`agent-team`, `agent-chain`, `pi-pi`) pass agent `.md` file contents as CLI arguments. To prevent injection via malicious agent definitions, all three use a shared loader (`extensions/utils/agent-loader.ts`) that validates every agent file at load time: + +- **Name**: Must be alphanumeric with dashes/underscores/dots, max 64 characters. Names with shell metacharacters (`;`, `$`, `` ` ``, `|`, `&`) are rejected. +- **Tools**: Checked against a known allowlist. Unknown tools produce a warning. +- **System prompt**: Scanned for shell injection patterns (`$(…)`, pipe to shell, chained destructive commands, null bytes, `eval()`). Capped at 50,000 characters. + +Agents with error-severity issues are rejected and won't load. Suspicious content produces warnings but still allows loading. See [SECURITY_AUDIT.md](SECURITY_AUDIT.md) for the full audit. + +### Running Tests + +```bash +npx tsx --test tests/agent-loader.test.ts +``` + --- ## Extension Author Reference @@ -213,6 +234,7 @@ Companion docs cover the conventions used across all extensions in this repo: - **[RESERVED_KEYS.md](RESERVED_KEYS.md)** — Pi reserved keybindings, overridable keys, and safe keys for extension authors. - **[THEME.md](THEME.md)** — Color language: which Pi theme tokens (`success`, `accent`, `warning`, `dim`, `muted`) map to which UI roles, with examples. - **[TOOLS.md](TOOLS.md)** — Function signatures for the built-in tools available inside extensions (`read`, `bash`, `edit`, `write`). +- **[SECURITY_AUDIT.md](SECURITY_AUDIT.md)** — Full security audit: 15 issues identified with severity ratings, fix plans, and behavior impact analysis. --- diff --git a/SECURITY_AUDIT.md b/SECURITY_AUDIT.md new file mode 100644 index 0000000..fd49c1d --- /dev/null +++ b/SECURITY_AUDIT.md @@ -0,0 +1,628 @@ +# Security Audit — pi-vs-claude-code + +**Date:** February 23, 2026 +**Scope:** All extensions, agent definitions, configuration files, and damage-control rules +**Auditor:** Claude Code (automated analysis) + +--- + +## Table of Contents + +- [Summary](#summary) +- [Critical Issues](#critical-issues) + - [SEC-001: Command Injection via Unsanitized User Input in spawn()](#sec-001-command-injection-via-unsanitized-user-input-in-spawn) + - [SEC-002: Full Environment Inheritance in Subprocess Spawning](#sec-002-full-environment-inheritance-in-subprocess-spawning) + - [SEC-003: Damage-Control Bypass via Path Manipulation](#sec-003-damage-control-bypass-via-path-manipulation) + - [SEC-004: Regex Denial of Service in Damage-Control Rules](#sec-004-regex-denial-of-service-in-damage-control-rules) +- [High Issues](#high-issues) + - [SEC-005: .env.sample Contains Key Format Hints](#sec-005-envsample-contains-key-format-hints) + - [SEC-006: Agent Session Files Stored in Plaintext](#sec-006-agent-session-files-stored-in-plaintext) + - [SEC-007: No Input Length Validation on Tasks/Prompts](#sec-007-no-input-length-validation-on-tasksprompts) + - [SEC-008: cross-agent.ts Executes Commands from Foreign Agent Directories](#sec-008-cross-agentts-executes-commands-from-foreign-agent-directories) +- [Medium Issues](#medium-issues) + - [SEC-009: Swallowed Errors Hide Failures Silently](#sec-009-swallowed-errors-hide-failures-silently) + - [SEC-010: system-select.ts Loads System Prompts from Global Home Directories](#sec-010-system-selectts-loads-system-prompts-from-global-home-directories) + - [SEC-011: Race Condition in Subprocess Management](#sec-011-race-condition-in-subprocess-management) + - [SEC-012: damage-control.ts Read-Only Heuristic for Bash is Weak](#sec-012-damage-controlts-read-only-heuristic-for-bash-is-weak) +- [Low / Informational Issues](#low--informational-issues) + - [SEC-013: process.argv Access for Theme Resolution](#sec-013-processargv-access-for-theme-resolution) + - [SEC-014: No CSP or Sandboxing for Subprocesses](#sec-014-no-csp-or-sandboxing-for-subprocesses) + - [SEC-015: .pi/settings.json References Parent Directory](#sec-015-pisettingsjson-references-parent-directory) +- [What's Done Well](#whats-done-well) +- [Priority Matrix](#priority-matrix) + +--- + +## Summary + +| Severity | Count | +|----------|-------| +| 🔴 Critical | 4 | +| 🟠 High | 4 | +| 🟡 Medium | 4 | +| 🟢 Low / Informational | 3 | +| **Total** | **15** | + +The codebase is a Pi coding agent extension playground with 15 TypeScript extensions, agent definitions, and a damage-control rule system. The code is primarily client-side tooling with no web server. The most significant risks center around subprocess spawning, prompt injection through agent definition files, and bypassable guardrails in the damage-control system. + +--- + +## Critical Issues + +--- + +### SEC-001: Command Injection via Unsanitized User Input in spawn() ✅ FIXED + +**Severity:** 🔴 Critical +**Status:** Fixed — 2026-02-23 +**Files:** `extensions/agent-team.ts`, `extensions/agent-chain.ts`, `extensions/pi-pi.ts`, `extensions/subagent-widget.ts` + +#### Description + +All four extensions pass user-provided task strings directly as CLI arguments to `spawn("pi", [...args, task])`. While `spawn` with an argument array is safer than shell execution, the `--append-system-prompt` flag passes the full agent system prompt (which can contain user-influenced content) as a single CLI argument. If any agent definition `.md` file is maliciously crafted (e.g., via a PR), the system prompt is injected verbatim into subprocesses. + +```typescript +const args = [ + "--append-system-prompt", agentDef.systemPrompt, // from .md file on disk + task, // user input passed as final arg +]; +spawn("pi", args, { stdio: ["ignore", "pipe", "pipe"], env: { ...process.env } }); +``` + +A malicious `.md` agent file could craft a system prompt containing shell metacharacters or prompt injection payloads that alter subprocess behavior. + +#### Fix Plan + +Create a shared `loadAndValidateAgent()` function in a new `extensions/utils/agent-loader.ts` that all extensions call instead of raw `readFileSync` + parse. This function validates the `.md` file against a strict schema: + +- Agent `name` must be alphanumeric plus dashes only +- `tools` must be from an allowlist of known tool names +- The system prompt body is scanned for suspicious patterns (shell metacharacters like backticks, `$(...)`, `\x00` null bytes, excessively long lines) +- Invalid agents are rejected at load time with a warning notification rather than silently loaded + +All extensions would import from this shared utility instead of each having their own `parseAgentFile()`. + +#### Behavior Change + +If someone adds an agent `.md` file with unusual characters in the system prompt (e.g., backtick-wrapped shell commands as examples), it would be flagged and require explicit approval or a config override to load. Legitimate agents with clean markdown would be unaffected. The four extensions with duplicate `parseAgentFile()` implementations would converge on a single shared loader. + +--- + +### SEC-002: Full Environment Inheritance in Subprocess Spawning + +**Severity:** 🔴 Critical +**Files:** `extensions/agent-team.ts`, `extensions/agent-chain.ts`, `extensions/pi-pi.ts`, `extensions/subagent-widget.ts` + +#### Description + +All subprocesses inherit the full parent environment via `env: { ...process.env }`, including every API key (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, `OPENROUTER_API_KEY`, `FIRECRAWL_API_KEY`). If a subagent is compromised or a malicious agent definition is loaded, it has access to all credentials regardless of which provider it actually needs. + +```typescript +spawn("pi", args, { + stdio: ["ignore", "pipe", "pipe"], + env: { ...process.env }, // inherits ALL env vars +}); +``` + +#### Fix Plan + +Create a `buildSubprocessEnv()` helper in `extensions/utils/env.ts` that constructs a minimal environment. It includes only: + +- `PATH`, `HOME`, `TERM`, `LANG` (system essentials) +- The single API key matching the selected model provider (e.g., if model is `anthropic/*`, only pass `ANTHROPIC_API_KEY`) + +Each agent definition could optionally declare additional env vars in its frontmatter: + +```yaml +env: FIRECRAWL_API_KEY +``` + +The helper maps provider prefixes to env var names: +- `anthropic/*` → `ANTHROPIC_API_KEY` +- `openai/*` → `OPENAI_API_KEY` +- `google/*` → `GEMINI_API_KEY` +- `openrouter/*` → `OPENROUTER_API_KEY` + +#### Behavior Change + +Subagents using OpenAI models would no longer see `ANTHROPIC_API_KEY` and vice versa. The pi-pi experts that use firecrawl would need `env: FIRECRAWL_API_KEY` added to their frontmatter. Any extension spawning a subprocess that relies on an unexpected env var (like a custom `NODE_PATH` or proxy config) would break until that var is added to the allowlist. Most workflows would be unaffected since subprocesses only need the one API key for their model. + +--- + +### SEC-003: Damage-Control Bypass via Path Manipulation + +**Severity:** 🔴 Critical +**File:** `extensions/damage-control.ts` + +#### Description + +The `isPathMatch()` function has multiple bypass vectors: + +```typescript +function isPathMatch(targetPath: string, pattern: string, cwd: string): boolean { + const regex = new RegExp(`^${regexPattern}$|^${regexPattern}/|/${regexPattern}$|/${regexPattern}/`); + return regex.test(targetPath) || regex.test(relativePath) || + targetPath.includes(resolvedPattern) || relativePath.includes(resolvedPattern); +} +``` + +**Known bypass vectors:** +- **Symlink traversal:** `ln -s ~/.ssh/id_rsa ./innocent_file` then reading `./innocent_file` +- **Path encoding:** `./path/../.env` or `path/to/../../.env` may not match depending on resolution order +- **Bash indirection:** `cat $(echo .env)` — the substring check on the command string doesn't see `.env` +- **Bash variables:** `x=.env; cat $x` bypasses `command.includes(".env")` +- **Base64 encoding:** `cat $(echo LmVudg== | base64 -d)` reads `.env` + +#### Fix Plan + +Overhaul `isPathMatch()` in three stages: + +1. **Symlink resolution:** Resolve all input paths through `fs.realpathSync()` wrapped in a try/catch (for non-existent targets), falling back to `path.resolve()`. This catches symlink traversal. +2. **Path normalization:** Collapse `..` segments via `path.resolve()` before any comparison. +3. **Bash argument extraction:** For bash commands, move from substring matching to a two-layer approach: extract all file path arguments from the command using a basic shell argument parser (split on spaces, handle quotes), resolve each extracted path, and check those resolved paths against the rules. + +#### Behavior Change + +Symlinked paths would now be caught — `cat ./link-to-env` would be blocked if the symlink target is `.env`. Commands with `../` traversal like `cat foo/../../.env` would be blocked. The bash argument parser would add slight overhead to every bash tool call (microseconds, not noticeable). Edge cases like `cat $(echo .env)` would still bypass — the system needs a documented "best-effort" caveat for bash indirection. Some legitimate commands containing path-like strings matching rules could get false-positive blocks, so the `ask: true` pattern should be used more liberally for ambiguous matches. + +--- + +### SEC-004: Regex Denial of Service in Damage-Control Rules + +**Severity:** 🔴 Critical +**File:** `extensions/damage-control.ts` + +#### Description + +The YAML rules file contains regex patterns compiled at runtime with no validation: + +```typescript +for (const rule of rules.bashToolPatterns) { + const regex = new RegExp(rule.pattern); // arbitrary regex from YAML + if (regex.test(command)) { ... } +} +``` + +A malicious or poorly crafted pattern could cause catastrophic backtracking. The current YAML patterns are well-written, but the system accepts arbitrary regex from a config file that could be modified. + +#### Fix Plan + +Add a `validateRegex()` call in the `session_start` handler right after loading the YAML. Two approaches: + +**Option A (preferred): Switch to `re2` package** +- The `re2` npm package guarantees linear-time matching and eliminates ReDoS entirely +- Requires rewriting patterns that use lookahead/lookbehind (e.g., `--force(?!-with-lease)` becomes two rules) + +**Option B: Timeout-based validation** +- For each pattern, compile it and test against adversarial strings (long repeated characters) +- Use `vm.runInNewContext` with a 100ms timeout +- Patterns that fail validation are dropped with a warning notification + +In either case, validate at load time and surface failures. + +#### Behavior Change + +**If using `re2`:** The rule `'--force(?!-with-lease)'` uses a negative lookahead which `re2` doesn't support. This would need to become two rules: one that matches `--force` with `ask: true`, and a separate early-return allowance for `--force-with-lease`. All other current rules would work unchanged. + +**If using timeout validation:** The current rules all pass fine and nothing changes for normal usage. Only future malicious or poorly-written rules added to the YAML would be caught and dropped with a notification. + +--- + +## High Issues + +--- + +### SEC-005: .env.sample Contains Key Format Hints + +**Severity:** 🟠 High +**File:** `.env.sample` + +#### Description + +The sample environment file reveals exact key prefixes and which services are in use: + +```bash +OPENAI_API_KEY=sk-... +ANTHROPIC_API_KEY=sk-ant-... +FIRECRAWL_API_KEY=fc-... +``` + +This aids targeted credential theft by revealing the exact services and key formats used. + +#### Fix Plan + +Replace all placeholder values with generic strings: + +```bash +OPENAI_API_KEY=your-openai-api-key-here +ANTHROPIC_API_KEY=your-anthropic-api-key-here +FIRECRAWL_API_KEY=your-firecrawl-api-key-here +``` + +Add a comment block with links to each provider's dashboard for key generation. + +#### Behavior Change + +None to functionality. Developers copying the sample would see clearer instructions. The only downside is losing the visual cue of key prefix format, which some developers find helpful for verifying they pasted the right key. This is a worthwhile tradeoff. + +--- + +### SEC-006: Agent Session Files Stored in Plaintext + +**Severity:** 🟠 High +**Files:** `extensions/agent-team.ts`, `extensions/agent-chain.ts` + +#### Description + +Session files under `.pi/agent-sessions/` contain full conversation history including potentially sensitive data (code, credentials mentioned in chat, file contents). These are unencrypted JSON files with no access control. + +```typescript +const agentSessionFile = join(sessionDir, `${agentKey}.json`); +``` + +#### Fix Plan + +Two-part approach: + +**Part 1 — Session cleanup (immediate):** +In the `session_start` handlers (which already wipe session files on `/new`), add a `maxAge` check that deletes session files older than 24 hours on startup. + +**Part 2 — Optional encryption (follow-up):** +Add an `encryptSessions: true` setting in `.pi/settings.json`. When enabled, use Node's `crypto.createCipheriv()` with AES-256-GCM. The key is derived from a machine-local secret (hash of hostname + user UID + a salt stored in `~/.pi/session-key`). Encrypt session JSON before writing, decrypt on read. + +#### Behavior Change + +**With cleanup only:** Old session files disappear after 24 hours. This is fine since these extensions already wipe sessions on every `/new`. + +**With encryption:** Session files become unreadable binary blobs. The `-c` (continue) flag in subprocesses would still work because the same machine-local key decrypts them. Copying session files to another machine would not work, but they are not portable today anyway. Encryption adds a few milliseconds of overhead per subprocess launch. + +--- + +### SEC-007: No Input Length Validation on Tasks/Prompts + +**Severity:** 🟠 High +**Files:** `extensions/agent-team.ts`, `extensions/agent-chain.ts`, `extensions/pi-pi.ts`, `extensions/subagent-widget.ts` + +#### Description + +User input for task descriptions has no length limits. An extremely long prompt could: +- Exceed CLI argument length limits (causing silent failures) +- Cause excessive token consumption and cost +- Be used for prompt stuffing attacks + +#### Fix Plan + +Add a `MAX_TASK_LENGTH` constant (10,000 characters) in each extension that spawns subprocesses. Before calling `dispatchAgent()`, `runAgent()`, or `queryExpert()`, check `task.length > MAX_TASK_LENGTH` and return an error result if exceeded. + +Also add a `MAX_SYSTEM_PROMPT_LENGTH` check (50,000 characters) when loading agent `.md` files, to catch absurdly large agent definitions. + +Both limits would be defined as named constants at the top of each file for easy tuning. + +#### Behavior Change + +Tasks over 10K characters would be rejected with an error message telling the agent to break the task into smaller pieces. This is unlikely to affect normal usage — most task descriptions are a few hundred characters. The system prompt check would only trigger if someone accidentally concatenates a huge file into an agent definition. + +--- + +### SEC-008: cross-agent.ts Executes Commands from Foreign Agent Directories + +**Severity:** 🟠 High +**File:** `extensions/cross-agent.ts` + +#### Description + +The extension loads `.md` command templates from `.claude/`, `.gemini/`, `.codex/` directories and performs string substitution, then sends the result as a user message: + +```typescript +function expandArgs(template: string, args: string): string { + let result = template; + result = result.replace(/\$ARGUMENTS|\$@/g, args); + for (let i = 0; i < parts.length; i++) { + result = result.replaceAll(`$${i + 1}`, parts[i]); + } + return result; +} +// ... +pi.sendUserMessage(expandArgs(cmd.content, args || "")); +``` + +If a malicious command template is placed in those directories (via a compromised dependency or shared repo), it could inject arbitrary prompts. + +#### Fix Plan + +Add a confirmation step before sending expanded command content. After `expandArgs()` produces the final string, show a `ctx.ui.confirm()` dialog: "Run command from {source}?" with a preview of the first 200 characters. + +Add a per-session allowlist: once a user approves a command name from a specific source, skip the confirmation for subsequent uses in the same session. Store the allowlist in a `Set` keyed by `${source}:${commandName}`. + +Project-local commands from `.pi/` could optionally be trusted by default since they are part of the project's own configuration. + +#### Behavior Change + +First time a user runs `/some-claude-command`, they would see a confirmation dialog showing what text will be injected. Subsequent uses of the same command in the same session would go through without prompting. This adds one extra interaction per unique foreign command per session. The `remember for session` behavior mitigates friction for repeated use. + +--- + +## Medium Issues + +--- + +### SEC-009: Swallowed Errors Hide Failures Silently + +**Severity:** 🟡 Medium +**Files:** `extensions/agent-team.ts`, `extensions/agent-chain.ts`, `extensions/pi-pi.ts`, `extensions/system-select.ts`, `extensions/cross-agent.ts`, `extensions/damage-control.ts` + +#### Description + +Empty catch blocks throughout the codebase silently swallow errors: + +```typescript +} catch {} // appears in at least 6 extensions +``` + +This includes file read failures, JSON parse errors, and spawn errors. It makes debugging harder and could mask security-relevant failures such as a corrupted agent file being silently skipped. + +#### Fix Plan + +Create a shared `log` utility in `extensions/utils/logger.ts`. It writes to a rotating log file at `.pi/extension-debug.log` (max 1MB, one backup file). Replace all empty `catch {}` blocks with `catch (e) { log.warn("context-description", e); }`. + +The logger is a no-op if a `DEBUG_EXTENSIONS` env var is not set, so there is zero overhead in normal usage. In debug mode, errors also surface as dim notifications via `ctx.ui.notify()`. + +Add `.pi/extension-debug.log` to `.gitignore`. + +#### Behavior Change + +In normal usage, nothing visible changes — the logger defaults to silent. When `DEBUG_EXTENSIONS=1` is set, developers would see logged warnings for things like: unreadable agent files, malformed JSON in subprocess stdout, failed theme applications, and filesystem scan errors. This makes troubleshooting extension issues much faster without adding noise to normal workflows. + +--- + +### SEC-010: system-select.ts Loads System Prompts from Global Home Directories + +**Severity:** 🟡 Medium +**File:** `extensions/system-select.ts` + +#### Description + +Agents loaded from global home directories are given the same trust as project-local ones: + +```typescript +const dirs: [string, string][] = [ + [join(home, ".pi", "agent", "agents"), "~/.pi"], + [join(home, ".claude", "agents"), "~/.claude"], + [join(home, ".gemini", "agents"), "~/.gemini"], + [join(home, ".codex", "agents"), "~/.codex"], +]; +``` + +A malicious global agent could override the system prompt with prompt injection payloads. The `[source]` label in the select dialog is present but easy to overlook. + +#### Fix Plan + +Add a visual trust indicator in the select dialog. Prefix global agents with ⚠️ and local agents with ✓. Add a `trustedSources` array in `.pi/settings.json`: + +```json +{ + "trustedSources": [".pi", ".claude"] +} +``` + +Global sources (`~/.pi`, `~/.claude`, etc.) are loaded but marked as `untrusted` unless explicitly added to the config. When an untrusted agent is selected, show a one-time `ctx.ui.confirm()` dialog: "This agent is from {source} (outside this project). Use it?" + +#### Behavior Change + +Project-local agents work exactly as before. Global agents still appear in the list but are visually marked and require one confirmation per session. Users who rely on global agents would add `"~/.pi"` to `trustedSources` in their settings to skip the prompt. The select dialog becomes more informative with trust indicators. + +--- + +### SEC-011: Race Condition in Subprocess Management + +**Severity:** 🟡 Medium +**Files:** `extensions/agent-team.ts`, `extensions/pi-pi.ts`, `extensions/subagent-widget.ts` + +#### Description + +The check-then-set pattern for tracking running agents is not atomic: + +```typescript +if (state.status === "running") { + return Promise.resolve({ output: `Already running...`, exitCode: 1, elapsed: 0 }); +} +state.status = "running"; // gap between check and set +``` + +Two rapid dispatches to the same agent could both pass the check before either sets `"running"`. This is unlikely in single-threaded Node.js but could happen with async scheduling if two tool calls arrive in one response. + +#### Fix Plan + +Replace the check-then-set pattern with a `Set` called `inFlight` managed synchronously: + +```typescript +const inFlight = new Set(); + +function dispatchAgent(agentName, task, ctx) { + const key = agentName.toLowerCase(); + if (inFlight.has(key)) return Promise.resolve({ ... error ... }); + inFlight.add(key); // synchronous — no gap + // ... + proc.on("close", () => { inFlight.delete(key); }); + proc.on("error", () => { inFlight.delete(key); }); +} +``` + +This is safe because Node.js is single-threaded and `Set` operations are synchronous. Cleanup in `session_start` clears the set to prevent stale locks. + +#### Behavior Change + +Functionally identical to current behavior for normal usage. The difference is that even if two `dispatch_agent` tool calls arrive in the same microtask batch (theoretically possible if the LLM produces two tool calls in one response), the second one is guaranteed to be rejected. The `status` field on the agent state still gets updated as before, but the `inFlight` set is the authoritative lock. + +--- + +### SEC-012: damage-control.ts Read-Only Heuristic for Bash is Weak + +**Severity:** 🟡 Medium +**File:** `extensions/damage-control.ts` + +#### Description + +The heuristic for detecting modifications to read-only paths is easily bypassed through interpreter indirection: + +```typescript +if (command.includes(rop) && (/[\s>|]/.test(command) || command.includes("rm") || ...)) +``` + +Bypass examples: +- `python3 -c "open('.bashrc','w').write('pwned')"` — no `rm`, `sed`, `>`, or `|` +- `node -e "require('fs').writeFileSync('.bashrc','pwned')"` — same + +#### Fix Plan + +Two changes: + +1. **Documentation:** Add explicit documentation (in the YAML file header and a new `DAMAGE_CONTROL.md` doc) stating: "Damage-control is a safety net for accidental destructive commands, not a security sandbox. Determined bypass through interpreter indirection (python, node, perl) is not caught." + +2. **Interpreter patterns:** Add a new YAML section `interpreterWriteWarnings` with patterns like: + - `python.*open\(.*['"]w` + - `node.*writeFile` + - `perl.*open.*>` + + These trigger `ask: true` confirmations rather than hard blocks, catching the most obvious interpreter-based file writes without being overly restrictive. + +#### Behavior Change + +Commands like `python3 -c "open('.bashrc','w')"` would now trigger a confirmation dialog instead of passing silently. Legitimate use of python/node/perl for file operations would get an extra confirmation step, which could be annoying during heavy scripting sessions. The documentation change sets correct expectations — users understand they are getting a seatbelt, not a jail. + +--- + +## Low / Informational Issues + +--- + +### SEC-013: process.argv Access for Theme Resolution + +**Severity:** 🟢 Low +**File:** `extensions/themeMap.ts` + +#### Description + +Theme resolution depends on parsing `process.argv` to find the first `-e` flag: + +```typescript +function primaryExtensionName(): string | null { + const argv = process.argv; + for (let i = 0; i < argv.length - 1; i++) { + if (argv[i] === "-e" || argv[i] === "--extension") { + return basename(argv[i + 1]).replace(/\.[^.]+$/, ""); + } + } + return null; +} +``` + +Not a direct vulnerability, but creates coupling to the launch command that could behave unexpectedly if the CLI interface changes. + +#### Fix Plan + +Replace `process.argv` parsing with an extension registration-order approach. Add a module-level `let primaryName: string | null = null`. The first extension to call `applyExtensionDefaults()` sets the value; subsequent calls see it is already set and skip theme/title application. This removes the dependency on how the CLI was invoked. + +#### Behavior Change + +The theme and title would always be set by whichever extension's `session_start` fires first, regardless of how pi was launched. This is already the intended behavior — the `process.argv` approach was a workaround to achieve it. If extension load order changes in a future pi version, the behavior remains consistent because it is based on execution order rather than CLI argument parsing. + +--- + +### SEC-014: No CSP or Sandboxing for Subprocesses + +**Severity:** 🟢 Low +**Files:** `extensions/agent-team.ts`, `extensions/agent-chain.ts`, `extensions/pi-pi.ts`, `extensions/subagent-widget.ts` + +#### Description + +All spawned `pi` subprocesses run with the same OS user privileges as the parent. There is no sandboxing, resource limits, or timeout enforcement beyond what pi itself implements. A runaway agent could consume CPU/memory indefinitely. + +#### Fix Plan + +Add a `timeout` option to all `spawn()` calls: + +- Default: 5 minutes for normal agents, 15 minutes for chains +- Implementation: store the `proc` reference and set a `setTimeout` that calls `proc.kill("SIGTERM")` followed by a 5-second grace period and `proc.kill("SIGKILL")` +- Update the agent state to `"error"` with a `"timed out"` message +- Allow per-agent override via frontmatter: `timeout: 900` (seconds) + +Add a `maxConcurrentAgents` limit (default 5) enforced via an `inFlight` set (see SEC-011). + +#### Behavior Change + +Long-running agents (over 5 minutes by default) would be killed. This could affect legitimate long tasks like large refactors — users would need to increase the timeout via a setting or per-agent frontmatter field. The concurrent agent limit would prevent scenarios where a looping orchestrator spawns dozens of subprocesses. The `dispatch_agent` tool would return a clear "timed out after Xs" error so the orchestrator can retry or adjust. + +--- + +### SEC-015: .pi/settings.json References Parent Directory + +**Severity:** 🟢 Low +**File:** `.pi/settings.json` + +#### Description + +The settings file contains a parent-relative path: + +```json +{ + "prompts": ["../.claude/commands"] +} +``` + +This traverses outside the project root, which could unintentionally expose commands from a different project context. + +#### Fix Plan + +Add a path validation check in the settings loader. When resolving paths from `settings.json`: + +1. Call `path.resolve(cwd, configuredPath)` +2. Verify the result starts with `cwd` or is within a known safe location (`~/.pi/`, `~/.claude/`) +3. Paths that escape the project root log a warning + +Add an `allowParentPaths: true` escape hatch in settings for users who intentionally share configs across sibling projects. + +#### Behavior Change + +The current `"../.claude/commands"` reference would trigger a warning on startup: "Settings reference path outside project root: ../.claude/commands". If `allowParentPaths` is false (the default), the path would be skipped. Users with the current setup would need to either add `allowParentPaths: true` or restructure to use `~/.claude/commands` (absolute home-relative path) instead. This is a minor inconvenience but prevents unintentional cross-project config leakage. + +--- + +## What's Done Well + +| Area | Detail | +|------|--------| +| ✅ Credential hygiene | `.env` is in `.gitignore` — credentials will not be committed | +| ✅ Session data isolation | `.pi/agent-sessions/` is in `.gitignore` — session data stays local | +| ✅ Comprehensive rules | Damage-control covers destructive commands across AWS, GCP, Firebase, Vercel, Netlify, Cloudflare, SQL, and git | +| ✅ Graduated enforcement | `ask: true` pattern for borderline operations (git checkout, stash drop) — good UX | +| ✅ Safe subprocess API | `spawn()` with array args instead of shell execution avoids basic shell injection | +| ✅ Least-privilege agents | Agents have restricted toolsets (`read,grep,find,ls` for read-only agents) | +| ✅ Security awareness | Red-team agent exists in the project, showing security is a design consideration | +| ✅ Abort after block | `ctx.abort()` is called after damage-control violations, preventing continued execution | + +--- + +## Priority Matrix + +| ID | Severity | Issue | Effort | Fix Priority | +|----|----------|-------|--------|-------------| +| SEC-001 | 🔴 Critical | Command injection in spawn() | Medium | ✅ FIXED | +| SEC-002 | 🔴 Critical | Full env inheritance in subprocesses | Low | P0 | +| SEC-003 | 🔴 Critical | Damage-control path bypass | High | P0 | +| SEC-004 | 🔴 Critical | ReDoS in damage-control rules | Medium | P0 | +| SEC-005 | 🟠 High | .env.sample key format hints | Trivial | P1 | +| SEC-006 | 🟠 High | Plaintext session files | Medium | P1 | +| SEC-007 | 🟠 High | No input length validation | Low | P1 | +| SEC-008 | 🟠 High | Foreign command execution | Medium | P1 | +| SEC-009 | 🟡 Medium | Swallowed errors | Low | P2 | +| SEC-010 | 🟡 Medium | Global agent trust | Medium | P2 | +| SEC-011 | 🟡 Medium | Race condition in subprocess mgmt | Low | P2 | +| SEC-012 | 🟡 Medium | Weak bash read-only heuristic | Medium | P2 | +| SEC-013 | 🟢 Low | process.argv coupling | Low | P3 | +| SEC-014 | 🟢 Low | No subprocess sandboxing | Medium | P3 | +| SEC-015 | 🟢 Low | Parent directory reference | Low | P3 | + +--- + +*End of audit.* diff --git a/extensions/agent-chain.ts b/extensions/agent-chain.ts index 8cf7d2a..7b72b45 100644 --- a/extensions/agent-chain.ts +++ b/extensions/agent-chain.ts @@ -28,6 +28,7 @@ import { spawn } from "child_process"; import { readFileSync, existsSync, readdirSync, mkdirSync, unlinkSync } from "fs"; import { join, resolve } from "path"; import { applyExtensionDefaults } from "./themeMap.ts"; +import { scanAgentDirectory, type AgentDef as LoaderAgentDef, type ValidationWarning, type CollisionWarning } from "./utils/agent-loader.ts"; // ── Types ──────────────────────────────────────── @@ -42,12 +43,7 @@ interface ChainDef { steps: ChainStep[]; } -interface AgentDef { - name: string; - description: string; - tools: string; - systemPrompt: string; -} +type AgentDef = LoaderAgentDef; interface StepState { agent: string; @@ -130,61 +126,6 @@ function parseChainYaml(raw: string): ChainDef[] { return chains; } -// ── Frontmatter Parser ─────────────────────────── - -function parseAgentFile(filePath: string): AgentDef | null { - try { - const raw = readFileSync(filePath, "utf-8"); - const match = raw.match(/^---\n([\s\S]*?)\n---\n([\s\S]*)$/); - if (!match) return null; - - const frontmatter: Record = {}; - for (const line of match[1].split("\n")) { - const idx = line.indexOf(":"); - if (idx > 0) { - frontmatter[line.slice(0, idx).trim()] = line.slice(idx + 1).trim(); - } - } - - if (!frontmatter.name) return null; - - return { - name: frontmatter.name, - description: frontmatter.description || "", - tools: frontmatter.tools || "read,grep,find,ls", - systemPrompt: match[2].trim(), - }; - } catch { - return null; - } -} - -function scanAgentDirs(cwd: string): Map { - const dirs = [ - join(cwd, "agents"), - join(cwd, ".claude", "agents"), - join(cwd, ".pi", "agents"), - ]; - - const agents = new Map(); - - for (const dir of dirs) { - if (!existsSync(dir)) continue; - try { - for (const file of readdirSync(dir)) { - if (!file.endsWith(".md")) continue; - const fullPath = resolve(dir, file); - const def = parseAgentFile(fullPath); - if (def && !agents.has(def.name.toLowerCase())) { - agents.set(def.name.toLowerCase(), def); - } - } - } catch {} - } - - return agents; -} - // ── Extension ──────────────────────────────────── export default function (pi: ExtensionAPI) { @@ -198,6 +139,7 @@ export default function (pi: ExtensionAPI) { // Per-step state for the active chain let stepStates: StepState[] = []; let pendingReset = false; + let lastCollisions: CollisionWarning[] = []; function loadChains(cwd: string) { sessionDir = join(cwd, ".pi", "agent-sessions"); @@ -205,7 +147,26 @@ export default function (pi: ExtensionAPI) { mkdirSync(sessionDir, { recursive: true }); } - allAgents = scanAgentDirs(cwd); + allAgents = new Map(); + lastCollisions = []; + const agentDirs = [ + join(cwd, "agents"), + join(cwd, ".claude", "agents"), + join(cwd, ".pi", "agents"), + ]; + for (const dir of agentDirs) { + const { agents, collisions } = scanAgentDirectory(dir, (_file, warning) => { + if (warning.severity === "error") { + console.error(`[agent-chain] ${_file}: ${warning.message}`); + } + }); + lastCollisions.push(...collisions); + for (const [key, def] of agents) { + if (!allAgents.has(key)) { + allAgents.set(key, def); + } + } + } agentSessions.clear(); for (const [key] of allAgents) { @@ -750,6 +711,10 @@ ${agentCatalog} // Reload chains + clear agentSessions map (all agents start fresh) loadChains(_ctx.cwd); + for (const c of lastCollisions) { + _ctx.ui.notify(`⚠️ Agent collision: "${c.name}" in ${c.duplicatePath} — already loaded from ${c.originalPath}`, "warning"); + } + if (chains.length === 0) { _ctx.ui.notify("No chains found in .pi/agents/agent-chain.yaml", "warning"); return; diff --git a/extensions/agent-team.ts b/extensions/agent-team.ts index 66ecbef..87989a6 100644 --- a/extensions/agent-team.ts +++ b/extensions/agent-team.ts @@ -24,16 +24,11 @@ import { spawn } from "child_process"; import { readdirSync, readFileSync, existsSync, mkdirSync, unlinkSync } from "fs"; import { join, resolve } from "path"; import { applyExtensionDefaults } from "./themeMap.ts"; +import { scanAgentDirectory, type AgentDef as LoaderAgentDef, type ValidationWarning, type CollisionWarning } from "./utils/agent-loader.ts"; // ── Types ──────────────────────────────────────── -interface AgentDef { - name: string; - description: string; - tools: string; - systemPrompt: string; - file: string; -} +type AgentDef = LoaderAgentDef; interface AgentState { def: AgentDef; @@ -74,63 +69,7 @@ function parseTeamsYaml(raw: string): Record { return teams; } -// ── Frontmatter Parser ─────────────────────────── - -function parseAgentFile(filePath: string): AgentDef | null { - try { - const raw = readFileSync(filePath, "utf-8"); - const match = raw.match(/^---\n([\s\S]*?)\n---\n([\s\S]*)$/); - if (!match) return null; - - const frontmatter: Record = {}; - for (const line of match[1].split("\n")) { - const idx = line.indexOf(":"); - if (idx > 0) { - frontmatter[line.slice(0, idx).trim()] = line.slice(idx + 1).trim(); - } - } - - if (!frontmatter.name) return null; - - return { - name: frontmatter.name, - description: frontmatter.description || "", - tools: frontmatter.tools || "read,grep,find,ls", - systemPrompt: match[2].trim(), - file: filePath, - }; - } catch { - return null; - } -} - -function scanAgentDirs(cwd: string): AgentDef[] { - const dirs = [ - join(cwd, "agents"), - join(cwd, ".claude", "agents"), - join(cwd, ".pi", "agents"), - ]; - - const agents: AgentDef[] = []; - const seen = new Set(); - - for (const dir of dirs) { - if (!existsSync(dir)) continue; - try { - for (const file of readdirSync(dir)) { - if (!file.endsWith(".md")) continue; - const fullPath = resolve(dir, file); - const def = parseAgentFile(fullPath); - if (def && !seen.has(def.name.toLowerCase())) { - seen.add(def.name.toLowerCase()); - agents.push(def); - } - } - } catch {} - } - return agents; -} // ── Extension ──────────────────────────────────── @@ -143,6 +82,7 @@ export default function (pi: ExtensionAPI) { let widgetCtx: any; let sessionDir = ""; let contextWindow = 0; + let lastCollisions: CollisionWarning[] = []; function loadAgents(cwd: string) { // Create session storage dir @@ -151,8 +91,31 @@ export default function (pi: ExtensionAPI) { mkdirSync(sessionDir, { recursive: true }); } - // Load all agent definitions - allAgentDefs = scanAgentDirs(cwd); + // Load all agent definitions (recursive walk with collision detection) + const agentDirs = [ + join(cwd, "agents"), + join(cwd, ".claude", "agents"), + join(cwd, ".pi", "agents"), + ]; + + const seen = new Set(); + allAgentDefs = []; + lastCollisions = []; + + for (const dir of agentDirs) { + const { agents, collisions } = scanAgentDirectory(dir, (_file, warning) => { + if (warning.severity === "error") { + console.error(`[agent-team] ${_file}: ${warning.message}`); + } + }); + lastCollisions.push(...collisions); + for (const [key, def] of agents) { + if (!seen.has(key)) { + seen.add(key); + allAgentDefs.push(def); + } + } + } // Load teams from .pi/agents/teams.yaml const teamsPath = join(cwd, ".pi", "agents", "teams.yaml"); @@ -689,6 +652,10 @@ ${agentCatalog}`, loadAgents(_ctx.cwd); + for (const c of lastCollisions) { + _ctx.ui.notify(`⚠️ Agent collision: "${c.name}" in ${c.duplicatePath} — already loaded from ${c.originalPath}`, "warning"); + } + // Default to first team — use /agents-team to switch const teamNames = Object.keys(teams); if (teamNames.length > 0) { diff --git a/extensions/pi-pi.ts b/extensions/pi-pi.ts index 97c46d2..a3053f7 100644 --- a/extensions/pi-pi.ts +++ b/extensions/pi-pi.ts @@ -19,19 +19,14 @@ import type { ExtensionAPI } from "@mariozechner/pi-coding-agent"; import { Type } from "@sinclair/typebox"; import { Text, truncateToWidth, visibleWidth } from "@mariozechner/pi-tui"; import { spawn } from "child_process"; -import { readdirSync, readFileSync, existsSync, mkdirSync } from "fs"; -import { join, resolve } from "path"; +import { readFileSync, existsSync } from "fs"; +import { join } from "path"; import { applyExtensionDefaults } from "./themeMap.ts"; +import { scanAgentDirectory, type AgentDef, type ValidationWarning, type CollisionWarning } from "./utils/agent-loader.ts"; // ── Types ──────────────────────────────────────── -interface ExpertDef { - name: string; - description: string; - tools: string; - systemPrompt: string; - file: string; -} +type ExpertDef = AgentDef; interface ExpertState { def: ExpertDef; @@ -49,34 +44,6 @@ function displayName(name: string): string { return name.split("-").map(w => w.charAt(0).toUpperCase() + w.slice(1)).join(" "); } -function parseAgentFile(filePath: string): ExpertDef | null { - try { - const raw = readFileSync(filePath, "utf-8"); - const match = raw.match(/^---\n([\s\S]*?)\n---\n([\s\S]*)$/); - if (!match) return null; - - const frontmatter: Record = {}; - for (const line of match[1].split("\n")) { - const idx = line.indexOf(":"); - if (idx > 0) { - frontmatter[line.slice(0, idx).trim()] = line.slice(idx + 1).trim(); - } - } - - if (!frontmatter.name) return null; - - return { - name: frontmatter.name, - description: frontmatter.description || "", - tools: frontmatter.tools || "read,grep,find,ls", - systemPrompt: match[2].trim(), - file: filePath, - }; - } catch { - return null; - } -} - // ── Expert card colors ──────────────────────────── // Each expert gets a unique hue: bg fills the card interior, // br is the matching border foreground (brighter shade of same hue). @@ -100,35 +67,39 @@ export default function (pi: ExtensionAPI) { const experts: Map = new Map(); let gridCols = 3; let widgetCtx: any; + let lastCollisions: CollisionWarning[] = []; function loadExperts(cwd: string) { // Pi Pi experts live in their own dedicated directory const piPiDir = join(cwd, ".pi", "agents", "pi-pi"); experts.clear(); + lastCollisions = []; if (!existsSync(piPiDir)) return; - try { - for (const file of readdirSync(piPiDir)) { - if (!file.endsWith(".md")) continue; - if (file === "pi-orchestrator.md") continue; - const fullPath = resolve(piPiDir, file); - const def = parseAgentFile(fullPath); - if (def) { - const key = def.name.toLowerCase(); - if (!experts.has(key)) { - experts.set(key, { - def, - status: "idle", - question: "", - elapsed: 0, - lastLine: "", - queryCount: 0, - }); - } - } + + const { agents: validAgents, collisions } = scanAgentDirectory(piPiDir, (_file, warning) => { + if (warning.severity === "error") { + console.error(`[pi-pi] ${_file}: ${warning.message}`); + } + }); + lastCollisions = collisions; + + // Exclude the orchestrator itself from the expert list + validAgents.delete("pi-orchestrator"); + + for (const [key, def] of validAgents) { + if (!experts.has(key)) { + experts.set(key, { + def, + status: "idle", + question: "", + elapsed: 0, + lastLine: "", + queryCount: 0, + }); } - } catch {} + } } // ── Grid Rendering ─────────────────────────── @@ -589,6 +560,11 @@ Ask specific questions about what you need to BUILD. Each expert will return doc widgetCtx = _ctx; loadExperts(_ctx.cwd); + + for (const c of lastCollisions) { + _ctx.ui.notify(`⚠️ Expert collision: "${c.name}" in ${c.duplicatePath} — already loaded from ${c.originalPath}`, "warning"); + } + updateWidget(); const expertNames = Array.from(experts.values()).map(s => displayName(s.def.name)).join(", "); diff --git a/extensions/pushover-notify.ts b/extensions/pushover-notify.ts new file mode 100644 index 0000000..6a180aa --- /dev/null +++ b/extensions/pushover-notify.ts @@ -0,0 +1,233 @@ +/** + * Pushover Notify — sends a Pushover notification whenever Pi finishes + * responding and is waiting for your input. + * + * Fires on `agent_end` and uses heuristics to detect if Pi is asking + * a question or needs something from you before notifying. + * + * Credentials (checked in order): + * 1. Sidecar config: .pi/extensions/pushover.json (project) + * 2. Sidecar config: ~/.pi/agent/extensions/pushover.json (global) + * 3. Environment variables: PUSHOVER_API_TOKEN, PUSHOVER_USER_KEY + * + * Sidecar JSON format: + * { "apiToken": "your_token", "userKey": "your_user_key" } + * Both values support "!shell command" syntax for secrets managers. + * + * Usage: pi -e extensions/pushover-notify.ts + */ + +import type { ExtensionAPI } from "@mariozechner/pi-coding-agent"; +import { readFileSync, existsSync } from "fs"; +import { join } from "path"; +import { homedir } from "os"; +import { execSync } from "child_process"; + +// --------------------------------------------------------------------------- +// Config loading +// --------------------------------------------------------------------------- + +interface PushoverConfig { + apiToken?: string; + userKey?: string; + /** Only notify when Pi is asking a question / needs input. Default: true */ + onlyWhenAsking?: boolean; + /** Max characters of the Pi response to include in notification body. Default: 200 */ + summaryLength?: number; +} + +function readJson(path: string): PushoverConfig { + if (!existsSync(path)) return {}; + try { + return JSON.parse(readFileSync(path, "utf-8")); + } catch { + return {}; + } +} + +function resolveValue(value: string | undefined): string | undefined { + if (!value) return undefined; + if (value.startsWith("!")) { + try { + return execSync(value.slice(1), { encoding: "utf-8" }).trim(); + } catch { + return undefined; + } + } + return value; +} + +function loadConfig(cwd: string): PushoverConfig { + const globalCfg = readJson( + join(homedir(), ".pi", "agent", "extensions", "pushover.json") + ); + const projectCfg = readJson(join(cwd, ".pi", "extensions", "pushover.json")); + + // Project overrides global + const merged: PushoverConfig = { ...globalCfg, ...projectCfg }; + + // Resolve credential values (supports !command syntax) + return { + ...merged, + apiToken: + resolveValue(merged.apiToken) ?? process.env.PUSHOVER_API_TOKEN, + userKey: + resolveValue(merged.userKey) ?? process.env.PUSHOVER_USER_KEY, + onlyWhenAsking: merged.onlyWhenAsking ?? true, + summaryLength: merged.summaryLength ?? 200, + }; +} + +// --------------------------------------------------------------------------- +// Heuristics — does Pi need something from the user? +// --------------------------------------------------------------------------- + +function isWaitingForInput(text: string): boolean { + const trimmed = text.trimEnd(); + + // Ends with a question mark + if (trimmed.endsWith("?")) return true; + + const lastLine = trimmed.split("\n").at(-1)?.toLowerCase() ?? ""; + + const actionPhrases = [ + "would you like", + "do you want", + "should i", + "shall i", + "can you", + "could you", + "what would you", + "which ", + "how would you", + "is that ok", + "is that correct", + "are you sure", + "let me know", + "please let me know", + "please confirm", + "please provide", + "please share", + "please review", + "awaiting your", + "waiting for your", + "your input", + "your feedback", + "your choice", + "your decision", + "your confirmation", + ]; + + return actionPhrases.some((phrase) => lastLine.includes(phrase)); +} + +// --------------------------------------------------------------------------- +// Summarise a long response into a compact notification body +// --------------------------------------------------------------------------- + +function buildNotificationBody(text: string, maxLength: number): string { + // Strip markdown fences and leading whitespace + const cleaned = text + .replace(/```[\s\S]*?```/g, "[code block]") + .replace(/`[^`]+`/g, (m) => m.slice(1, -1)) + .replace(/\n{3,}/g, "\n\n") + .trim(); + + if (cleaned.length <= maxLength) return cleaned; + + // Try to cut at a sentence boundary + const cutoff = cleaned.lastIndexOf(". ", maxLength); + if (cutoff > maxLength * 0.5) { + return cleaned.slice(0, cutoff + 1) + " …"; + } + + return cleaned.slice(0, maxLength) + " …"; +} + +// --------------------------------------------------------------------------- +// Pushover API call +// --------------------------------------------------------------------------- + +async function sendPushover( + token: string, + userKey: string, + title: string, + message: string +): Promise { + const body = JSON.stringify({ + token, + user: userKey, + title, + message: message || "(no message)", + // Use high priority so it breaks through quiet hours if Pi is waiting on you + priority: 0, + }); + + const res = await fetch("https://api.pushover.net/1/messages.json", { + method: "POST", + headers: { "Content-Type": "application/json" }, + body, + }); + + if (!res.ok) { + const err = await res.text().catch(() => res.statusText); + throw new Error(`Pushover API error ${res.status}: ${err}`); + } +} + +// --------------------------------------------------------------------------- +// Extension entry point +// --------------------------------------------------------------------------- + +export default function (pi: ExtensionAPI) { + const config = loadConfig(process.cwd()); + + const { apiToken, userKey, onlyWhenAsking, summaryLength = 200 } = config; + + if (!apiToken || !userKey) { + console.warn( + "[pushover-notify] Missing credentials. " + + "Set PUSHOVER_API_TOKEN + PUSHOVER_USER_KEY env vars, " + + "or create .pi/extensions/pushover.json with apiToken + userKey." + ); + return; + } + + pi.on("agent_end", async (event: any, _ctx: any) => { + try { + const messages: any[] = event.messages ?? []; + + // Get the last assistant message + const lastAssistant = [...messages] + .reverse() + .find((m: any) => m.role === "assistant"); + + if (!lastAssistant) return; + + // Extract plain text from the content blocks + const fullText: string = (lastAssistant.content ?? []) + .filter((b: any) => b.type === "text") + .map((b: any) => b.text as string) + .join("\n") + .trim(); + + if (!fullText) return; + + const waiting = isWaitingForInput(fullText); + + // If onlyWhenAsking is true, skip silent completions + if (onlyWhenAsking && !waiting) return; + + const title = waiting + ? "⏳ Pi needs your input" + : "✅ Pi finished"; + + const body = buildNotificationBody(fullText, summaryLength); + + await sendPushover(apiToken, userKey, title, body); + } catch (err: any) { + // Non-fatal — don't crash Pi over a failed notification + console.error("[pushover-notify] Failed to send notification:", err?.message ?? err); + } + }); +} diff --git a/extensions/utils/agent-loader.ts b/extensions/utils/agent-loader.ts new file mode 100644 index 0000000..01acc47 --- /dev/null +++ b/extensions/utils/agent-loader.ts @@ -0,0 +1,327 @@ +/** + * agent-loader.ts — Shared agent definition loader with validation + * + * SEC-001: All extensions that load agent .md files and pass their content + * to subprocess spawn() calls MUST use this module instead of raw file reads. + * + * Validates: + * - Agent name: alphanumeric, dashes, underscores only + * - Tools: must be from a known allowlist + * - System prompt: scanned for suspicious patterns (shell injection vectors) + * - Length: system prompt capped at MAX_SYSTEM_PROMPT_LENGTH + * + * Usage: + * import { loadAgentFile, scanAgentDirectory, type AgentDef } from "./utils/agent-loader.ts"; + */ + +import { readFileSync, readdirSync, existsSync, statSync } from "node:fs"; +import { join, resolve, basename, relative } from "node:path"; + +// ── Types ────────────────────────────────────────────────────────────── + +export interface AgentDef { + name: string; + description: string; + tools: string; + systemPrompt: string; + file: string; +} + +export interface ValidationWarning { + field: string; + message: string; + severity: "error" | "warning"; +} + +export interface LoadResult { + agent: AgentDef | null; + warnings: ValidationWarning[]; +} + +export interface CollisionWarning { + name: string; + duplicatePath: string; + originalPath: string; +} + +export interface ScanResult { + agents: Map; + collisions: CollisionWarning[]; +} + +// ── Constants ────────────────────────────────────────────────────────── + +/** Maximum system prompt length in characters. */ +export const MAX_SYSTEM_PROMPT_LENGTH = 50_000; + +/** Maximum agent name length. */ +const MAX_NAME_LENGTH = 64; + +/** Pattern for valid agent names: alphanumeric, dashes, underscores, dots. */ +const VALID_NAME_PATTERN = /^[a-zA-Z0-9][a-zA-Z0-9._-]*$/; + +/** Known safe tool names. Extensions can extend this set. */ +export const KNOWN_TOOLS = new Set([ + "read", "write", "edit", "bash", "grep", "find", "ls", + "fetch", "firecrawl", "dispatch_agent", "run_chain", + "query_experts", "tilldone", "subagent_create", + "subagent_continue", "subagent_remove", "subagent_list", +]); + +/** + * Patterns that indicate possible shell injection or prompt manipulation + * in a system prompt body. Each entry has a regex and a human-readable reason. + */ +const SUSPICIOUS_PATTERNS: { pattern: RegExp; reason: string }[] = [ + { pattern: /\$\(.*\)/, reason: "contains shell command substitution $(…)" }, + { pattern: /(?:^|[^`])`(rm|dd|mkfs|chmod|chown|kill|curl|wget|sh|bash|eval)\s[^`\n]*`/m, reason: "contains backtick expression with shell command" }, + { pattern: /\x00/, reason: "contains null byte" }, + { pattern: /\\\x27/, reason: "contains escaped single quote" }, + { pattern: /;\s*(rm|dd|mkfs|kill|chmod|chown)\s/, reason: "contains chained destructive shell command" }, + { pattern: /\|\s*(sh|bash|zsh|dash)\b/, reason: "contains pipe to shell" }, + { pattern: />\s*\/dev\//, reason: "contains redirect to /dev/" }, + { pattern: /\beval\s*\(/, reason: "contains eval() call" }, +]; + +// ── Frontmatter Parser ───────────────────────────────────────────────── + +interface ParsedFile { + fields: Record; + body: string; +} + +function parseFrontmatter(raw: string): ParsedFile | null { + const match = raw.match(/^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$/); + if (!match) return null; + + const fields: Record = {}; + for (const line of match[1].split("\n")) { + const idx = line.indexOf(":"); + if (idx > 0) { + fields[line.slice(0, idx).trim()] = line.slice(idx + 1).trim(); + } + } + + return { fields, body: match[2] }; +} + +// ── Validation ───────────────────────────────────────────────────────── + +/** + * Validate an agent name. + * Returns warnings if the name is invalid. + */ +export function validateName(name: string): ValidationWarning[] { + const warnings: ValidationWarning[] = []; + + if (!name) { + warnings.push({ field: "name", message: "agent name is empty", severity: "error" }); + return warnings; + } + + if (name.length > MAX_NAME_LENGTH) { + warnings.push({ + field: "name", + message: `agent name exceeds ${MAX_NAME_LENGTH} characters (got ${name.length})`, + severity: "error", + }); + } + + if (!VALID_NAME_PATTERN.test(name)) { + warnings.push({ + field: "name", + message: `agent name "${name}" contains invalid characters (allowed: a-z, 0-9, dash, underscore, dot)`, + severity: "error", + }); + } + + return warnings; +} + +/** + * Validate a comma-separated tools string. + * Returns warnings for unknown tools. + */ +export function validateTools(tools: string, knownTools: Set = KNOWN_TOOLS): ValidationWarning[] { + const warnings: ValidationWarning[] = []; + if (!tools) return warnings; + + const toolList = tools.split(",").map((t) => t.trim()).filter(Boolean); + + for (const tool of toolList) { + if (!knownTools.has(tool)) { + warnings.push({ + field: "tools", + message: `unknown tool "${tool}" — not in the known tools allowlist`, + severity: "warning", + }); + } + } + + return warnings; +} + +/** + * Validate a system prompt body for suspicious patterns. + * Returns warnings for each suspicious pattern found. + */ +export function validateSystemPrompt(body: string): ValidationWarning[] { + const warnings: ValidationWarning[] = []; + + if (body.length > MAX_SYSTEM_PROMPT_LENGTH) { + warnings.push({ + field: "systemPrompt", + message: `system prompt exceeds ${MAX_SYSTEM_PROMPT_LENGTH} characters (got ${body.length})`, + severity: "error", + }); + } + + for (const { pattern, reason } of SUSPICIOUS_PATTERNS) { + if (pattern.test(body)) { + warnings.push({ + field: "systemPrompt", + message: `suspicious content in system prompt: ${reason}`, + severity: "warning", + }); + } + } + + return warnings; +} + +/** + * Run all validations on a parsed agent definition. + */ +export function validateAgent(agent: AgentDef): ValidationWarning[] { + return [ + ...validateName(agent.name), + ...validateTools(agent.tools), + ...validateSystemPrompt(agent.systemPrompt), + ]; +} + +// ── Loading ──────────────────────────────────────────────────────────── + +/** + * Load and validate a single agent .md file. + * + * Returns the agent definition and any validation warnings. + * If there are any "error" severity warnings, agent will be null. + */ +export function loadAgentFile(filePath: string): LoadResult { + let raw: string; + try { + raw = readFileSync(filePath, "utf-8"); + } catch (err) { + return { + agent: null, + warnings: [{ + field: "file", + message: `could not read file: ${err instanceof Error ? err.message : String(err)}`, + severity: "error", + }], + }; + } + + const parsed = parseFrontmatter(raw); + if (!parsed) { + return { + agent: null, + warnings: [{ + field: "file", + message: "file does not have valid frontmatter (---\\n...\\n---)", + severity: "error", + }], + }; + } + + const { fields, body } = parsed; + const name = fields.name || basename(filePath, ".md"); + const description = fields.description || ""; + const tools = fields.tools || "read,grep,find,ls"; + const systemPrompt = body.trim(); + + const agent: AgentDef = { + name, + description, + tools, + systemPrompt, + file: filePath, + }; + + const warnings = validateAgent(agent); + + const hasErrors = warnings.some((w) => w.severity === "error"); + + return { + agent: hasErrors ? null : agent, + warnings, + }; +} + +/** + * Scan a directory recursively for agent .md files, validate each, and return valid agents. + * + * Walks nested subdirectories so agents can be organized: + * agents/ + * ├── review_agents/ + * │ ├── code_reviewer.md + * │ └── security_reviewer.md + * └── build_agents/ + * └── ts_builder.md + * + * When duplicate agent names are found (case-insensitive), the first one wins + * and a CollisionWarning is recorded. + * + * @param dir Directory to scan (recursively) + * @param onWarning Optional callback for each validation warning + * @returns ScanResult with agents map and collision warnings + */ +export function scanAgentDirectory( + dir: string, + onWarning?: (file: string, warning: ValidationWarning) => void, +): ScanResult { + const agents = new Map(); + const collisions: CollisionWarning[] = []; + + if (!existsSync(dir)) return { agents, collisions }; + + let files: (string | Buffer)[]; + try { + files = readdirSync(dir, { recursive: true }); + } catch { + return { agents, collisions }; + } + + for (const file of files) { + const rel = typeof file === "string" ? file : file.toString(); + if (!rel.endsWith(".md")) continue; + const fullPath = resolve(dir, rel); + + // Skip directories that happen to end in .md + try { if (!statSync(fullPath).isFile()) continue; } catch { continue; } + + const result = loadAgentFile(fullPath); + + if (onWarning) { + for (const w of result.warnings) { + onWarning(fullPath, w); + } + } + + if (result.agent) { + const key = result.agent.name.toLowerCase(); + if (agents.has(key)) { + collisions.push({ + name: result.agent.name, + duplicatePath: fullPath, + originalPath: agents.get(key)!.file, + }); + } else { + agents.set(key, result.agent); + } + } + } + + return { agents, collisions }; +} diff --git a/justfile b/justfile index 2d69dfe..6644abc 100644 --- a/justfile +++ b/justfile @@ -41,6 +41,9 @@ ext-subagent-widget: ext-tilldone: pi -e extensions/tilldone.ts -e extensions/theme-cycler.ts +ext-pushover: + pi -e extensions/pushover-notify.ts -e extensions/minimal.ts + #g2 # 10. Agent team: dispatcher orchestrator with team select and grid dashboard diff --git a/tests/agent-loader.test.ts b/tests/agent-loader.test.ts new file mode 100644 index 0000000..2ed12de --- /dev/null +++ b/tests/agent-loader.test.ts @@ -0,0 +1,580 @@ +/** + * Tests for extensions/utils/agent-loader.ts (SEC-001) + * + * Run: npx tsx --test tests/agent-loader.test.ts + */ + +import { describe, it } from "node:test"; +import assert from "node:assert/strict"; +import { mkdtempSync, writeFileSync, mkdirSync, rmSync, existsSync } from "node:fs"; +import { join } from "node:path"; +import { tmpdir } from "node:os"; +import { + validateName, + validateTools, + validateSystemPrompt, + validateAgent, + loadAgentFile, + scanAgentDirectory, + KNOWN_TOOLS, + MAX_SYSTEM_PROMPT_LENGTH, + type AgentDef, + type ValidationWarning, + type CollisionWarning, +} from "../extensions/utils/agent-loader.ts"; + +// ── Helpers ──────────────────────────────────────────────────────────── + +function tmpDir(): string { + return mkdtempSync(join(tmpdir(), "agent-loader-test-")); +} + +function writeAgent(dir: string, filename: string, content: string): string { + const filePath = join(dir, filename); + writeFileSync(filePath, content, "utf-8"); + return filePath; +} + +function makeAgentMd(fields: Record, body: string): string { + const frontmatter = Object.entries(fields) + .map(([k, v]) => `${k}: ${v}`) + .join("\n"); + return `---\n${frontmatter}\n---\n${body}`; +} + +function hasError(warnings: ValidationWarning[]): boolean { + return warnings.some((w) => w.severity === "error"); +} + +function hasWarning(warnings: ValidationWarning[]): boolean { + return warnings.some((w) => w.severity === "warning"); +} + +function warningMessages(warnings: ValidationWarning[]): string[] { + return warnings.map((w) => w.message); +} + +// ── validateName ─────────────────────────────────────────────────────── + +describe("validateName", () => { + it("accepts valid names", () => { + assert.deepEqual(validateName("scout"), []); + assert.deepEqual(validateName("red-team"), []); + assert.deepEqual(validateName("my_agent"), []); + assert.deepEqual(validateName("Agent.v2"), []); + assert.deepEqual(validateName("a123"), []); + }); + + it("rejects empty name", () => { + const warnings = validateName(""); + assert.ok(hasError(warnings)); + assert.ok(warningMessages(warnings).some((m) => m.includes("empty"))); + }); + + it("rejects names with spaces", () => { + const warnings = validateName("my agent"); + assert.ok(hasError(warnings)); + assert.ok(warningMessages(warnings).some((m) => m.includes("invalid characters"))); + }); + + it("rejects names with shell metacharacters", () => { + for (const bad of ["agent;rm", "agent$(cmd)", "agent`cmd`", "agent|pipe", "agent&bg"]) { + const warnings = validateName(bad); + assert.ok(hasError(warnings), `expected error for name "${bad}"`); + } + }); + + it("rejects names starting with dash", () => { + const warnings = validateName("-agent"); + assert.ok(hasError(warnings)); + }); + + it("rejects names starting with dot", () => { + const warnings = validateName(".hidden"); + assert.ok(hasError(warnings)); + }); + + it("rejects overly long names", () => { + const longName = "a".repeat(65); + const warnings = validateName(longName); + assert.ok(hasError(warnings)); + assert.ok(warningMessages(warnings).some((m) => m.includes("exceeds"))); + }); +}); + +// ── validateTools ────────────────────────────────────────────────────── + +describe("validateTools", () => { + it("accepts known tools", () => { + assert.deepEqual(validateTools("read,write,bash"), []); + assert.deepEqual(validateTools("grep,find,ls"), []); + }); + + it("accepts empty tools string", () => { + assert.deepEqual(validateTools(""), []); + }); + + it("warns on unknown tools", () => { + const warnings = validateTools("read,evil_tool,bash"); + assert.ok(hasWarning(warnings)); + assert.ok(warningMessages(warnings).some((m) => m.includes("evil_tool"))); + }); + + it("handles tools with whitespace", () => { + assert.deepEqual(validateTools("read, write, bash"), []); + }); + + it("warns on multiple unknown tools", () => { + const warnings = validateTools("foo,bar,read"); + assert.equal(warnings.length, 2); + }); + + it("accepts custom known tools set", () => { + const custom = new Set(["custom_tool"]); + assert.deepEqual(validateTools("custom_tool", custom), []); + const warnings = validateTools("read", custom); + assert.ok(hasWarning(warnings)); + }); +}); + +// ── validateSystemPrompt ─────────────────────────────────────────────── + +describe("validateSystemPrompt", () => { + it("accepts clean markdown prompts", () => { + const clean = `You are a helpful assistant.\n\n## Guidelines\n- Be concise\n- Follow patterns`; + assert.deepEqual(validateSystemPrompt(clean), []); + }); + + it("warns on shell command substitution $(…)", () => { + const warnings = validateSystemPrompt("Run this: $(rm -rf /)"); + assert.ok(hasWarning(warnings)); + assert.ok(warningMessages(warnings).some((m) => m.includes("command substitution"))); + }); + + it("warns on backtick shell command substitution", () => { + const warnings = validateSystemPrompt("Run this: `rm -rf /`"); + assert.ok(hasWarning(warnings)); + assert.ok(warningMessages(warnings).some((m) => m.includes("backtick"))); + }); + + it("warns on backtick with other shell commands", () => { + for (const cmd of ["`bash script.sh`", "`curl http://evil.com`", "`chmod 777 file`"]) { + const warnings = validateSystemPrompt(`Do: ${cmd}`); + assert.ok(hasWarning(warnings), `expected warning for ${cmd}`); + } + }); + + it("allows markdown inline code backticks", () => { + const warnings = validateSystemPrompt("Use the `read` tool and `grep` tool"); + const backtickWarnings = warnings.filter((w) => w.message.includes("backtick")); + assert.equal(backtickWarnings.length, 0); + }); + + it("warns on null bytes", () => { + const warnings = validateSystemPrompt("normal text\x00hidden"); + assert.ok(hasWarning(warnings)); + assert.ok(warningMessages(warnings).some((m) => m.includes("null byte"))); + }); + + it("warns on pipe to shell", () => { + const warnings = validateSystemPrompt("echo payload | bash"); + assert.ok(hasWarning(warnings)); + assert.ok(warningMessages(warnings).some((m) => m.includes("pipe to shell"))); + }); + + it("warns on chained destructive commands", () => { + const warnings = validateSystemPrompt("do stuff; rm -rf /"); + assert.ok(hasWarning(warnings)); + assert.ok(warningMessages(warnings).some((m) => m.includes("destructive"))); + }); + + it("warns on redirect to /dev/", () => { + const warnings = validateSystemPrompt("echo x > /dev/sda"); + assert.ok(hasWarning(warnings)); + assert.ok(warningMessages(warnings).some((m) => m.includes("redirect"))); + }); + + it("warns on eval()", () => { + const warnings = validateSystemPrompt("use eval(code)"); + assert.ok(hasWarning(warnings)); + assert.ok(warningMessages(warnings).some((m) => m.includes("eval"))); + }); + + it("errors on oversized prompts", () => { + const huge = "x".repeat(MAX_SYSTEM_PROMPT_LENGTH + 1); + const warnings = validateSystemPrompt(huge); + assert.ok(hasError(warnings)); + assert.ok(warningMessages(warnings).some((m) => m.includes("exceeds"))); + }); + + it("allows prompts at exactly the limit", () => { + const atLimit = "x".repeat(MAX_SYSTEM_PROMPT_LENGTH); + const warnings = validateSystemPrompt(atLimit); + assert.ok(!hasError(warnings)); + }); + + it("can accumulate multiple warnings", () => { + const multi = "$(cmd) and also; rm -rf / then | bash"; + const warnings = validateSystemPrompt(multi); + assert.ok(warnings.length >= 3, `expected >=3 warnings, got ${warnings.length}: ${warningMessages(warnings).join("; ")}`); + }); +}); + +// ── validateAgent (integration of all validators) ────────────────────── + +describe("validateAgent", () => { + it("returns no warnings for a clean agent", () => { + const agent: AgentDef = { + name: "scout", + description: "Fast recon agent", + tools: "read,grep,find,ls", + systemPrompt: "You are a scout agent. Investigate quickly.", + file: "/test/scout.md", + }; + assert.deepEqual(validateAgent(agent), []); + }); + + it("aggregates warnings from all validators", () => { + const agent: AgentDef = { + name: "bad name!", + description: "", + tools: "read,evil_tool", + systemPrompt: "Run $(whoami)", + file: "/test/bad.md", + }; + const warnings = validateAgent(agent); + assert.ok(warnings.length >= 3); + const fields = warnings.map((w) => w.field); + assert.ok(fields.includes("name")); + assert.ok(fields.includes("tools")); + assert.ok(fields.includes("systemPrompt")); + }); +}); + +// ── loadAgentFile ────────────────────────────────────────────────────── + +describe("loadAgentFile", () => { + it("loads a valid agent file", () => { + const dir = tmpDir(); + const content = makeAgentMd( + { name: "scout", description: "Fast recon", tools: "read,grep,find,ls" }, + "You are a scout agent.", + ); + const filePath = writeAgent(dir, "scout.md", content); + + const result = loadAgentFile(filePath); + assert.ok(result.agent); + assert.equal(result.agent.name, "scout"); + assert.equal(result.agent.description, "Fast recon"); + assert.equal(result.agent.tools, "read,grep,find,ls"); + assert.equal(result.agent.systemPrompt, "You are a scout agent."); + assert.equal(result.agent.file, filePath); + assert.deepEqual(result.warnings, []); + + rmSync(dir, { recursive: true }); + }); + + it("returns null agent for file with invalid name", () => { + const dir = tmpDir(); + const content = makeAgentMd( + { name: "bad name!", description: "test" }, + "body", + ); + const filePath = writeAgent(dir, "bad.md", content); + + const result = loadAgentFile(filePath); + assert.equal(result.agent, null); + assert.ok(hasError(result.warnings)); + + rmSync(dir, { recursive: true }); + }); + + it("returns agent with warnings for suspicious system prompt", () => { + const dir = tmpDir(); + const content = makeAgentMd( + { name: "sketchy", tools: "read,bash" }, + "You are an agent. Run $(whoami) first.", + ); + const filePath = writeAgent(dir, "sketchy.md", content); + + const result = loadAgentFile(filePath); + // Suspicious prompts produce warnings, not errors — agent is still loaded + assert.ok(result.agent); + assert.equal(result.agent!.name, "sketchy"); + assert.ok(hasWarning(result.warnings)); + + rmSync(dir, { recursive: true }); + }); + + it("returns null for oversized system prompt", () => { + const dir = tmpDir(); + const content = makeAgentMd( + { name: "huge" }, + "x".repeat(MAX_SYSTEM_PROMPT_LENGTH + 100), + ); + const filePath = writeAgent(dir, "huge.md", content); + + const result = loadAgentFile(filePath); + assert.equal(result.agent, null); + assert.ok(hasError(result.warnings)); + + rmSync(dir, { recursive: true }); + }); + + it("returns error for non-existent file", () => { + const result = loadAgentFile("/nonexistent/path/agent.md"); + assert.equal(result.agent, null); + assert.ok(hasError(result.warnings)); + assert.ok(warningMessages(result.warnings).some((m) => m.includes("could not read"))); + }); + + it("returns error for file without frontmatter", () => { + const dir = tmpDir(); + const filePath = writeAgent(dir, "nofm.md", "Just some text without frontmatter"); + + const result = loadAgentFile(filePath); + assert.equal(result.agent, null); + assert.ok(hasError(result.warnings)); + assert.ok(warningMessages(result.warnings).some((m) => m.includes("frontmatter"))); + + rmSync(dir, { recursive: true }); + }); + + it("uses filename as name when name field is missing", () => { + const dir = tmpDir(); + const content = makeAgentMd( + { description: "no name field" }, + "Body text.", + ); + const filePath = writeAgent(dir, "fallback-name.md", content); + + const result = loadAgentFile(filePath); + assert.ok(result.agent); + assert.equal(result.agent!.name, "fallback-name"); + + rmSync(dir, { recursive: true }); + }); + + it("defaults tools to read,grep,find,ls when not specified", () => { + const dir = tmpDir(); + const content = makeAgentMd({ name: "minimal" }, "Body."); + const filePath = writeAgent(dir, "minimal.md", content); + + const result = loadAgentFile(filePath); + assert.ok(result.agent); + assert.equal(result.agent!.tools, "read,grep,find,ls"); + + rmSync(dir, { recursive: true }); + }); +}); + +// ── scanAgentDirectory ───────────────────────────────────────────────── + +describe("scanAgentDirectory", () => { + it("loads all valid agents from a directory", () => { + const dir = tmpDir(); + writeAgent(dir, "scout.md", makeAgentMd({ name: "scout" }, "Scout body.")); + writeAgent(dir, "builder.md", makeAgentMd({ name: "builder", tools: "read,write,edit,bash" }, "Builder body.")); + + const { agents } = scanAgentDirectory(dir); + assert.equal(agents.size, 2); + assert.ok(agents.has("scout")); + assert.ok(agents.has("builder")); + + rmSync(dir, { recursive: true }); + }); + + it("skips agents with validation errors", () => { + const dir = tmpDir(); + writeAgent(dir, "good.md", makeAgentMd({ name: "good" }, "Good body.")); + writeAgent(dir, "bad.md", makeAgentMd({ name: "bad name!" }, "Bad body.")); + + const { agents } = scanAgentDirectory(dir); + assert.equal(agents.size, 1); + assert.ok(agents.has("good")); + + rmSync(dir, { recursive: true }); + }); + + it("calls onWarning callback for each warning", () => { + const dir = tmpDir(); + writeAgent(dir, "sketchy.md", makeAgentMd( + { name: "sketchy", tools: "read,evil_tool" }, + "Run $(whoami).", + )); + + const collected: { file: string; warning: ValidationWarning }[] = []; + scanAgentDirectory(dir, (file, warning) => { + collected.push({ file, warning }); + }); + + assert.ok(collected.length >= 2); // unknown tool + suspicious prompt + assert.ok(collected.some((c) => c.warning.field === "tools")); + assert.ok(collected.some((c) => c.warning.field === "systemPrompt")); + + rmSync(dir, { recursive: true }); + }); + + it("deduplicates by lowercase name (first wins)", () => { + const dir = tmpDir(); + writeAgent(dir, "agent-a.md", makeAgentMd({ name: "Scout" }, "First.")); + writeAgent(dir, "agent-b.md", makeAgentMd({ name: "scout" }, "Second.")); + + const { agents } = scanAgentDirectory(dir); + assert.equal(agents.size, 1); + assert.equal(agents.get("scout")!.systemPrompt, "First."); + + rmSync(dir, { recursive: true }); + }); + + it("returns empty map for non-existent directory", () => { + const { agents } = scanAgentDirectory("/nonexistent/path"); + assert.equal(agents.size, 0); + }); + + it("skips non-.md files", () => { + const dir = tmpDir(); + writeAgent(dir, "agent.md", makeAgentMd({ name: "agent" }, "Body.")); + writeAgent(dir, "readme.txt", "not an agent"); + writeAgent(dir, "config.json", "{}"); + + const { agents } = scanAgentDirectory(dir); + assert.equal(agents.size, 1); + + rmSync(dir, { recursive: true }); + }); +}); + +// ── scanAgentDirectory — recursive + collisions ──────────────────────── + +describe("scanAgentDirectory (recursive)", () => { + it("finds agents in nested subdirectories", () => { + const dir = tmpDir(); + mkdirSync(join(dir, "review_agents"), { recursive: true }); + mkdirSync(join(dir, "build_agents"), { recursive: true }); + + writeAgent(dir, "top-level.md", makeAgentMd({ name: "top-level" }, "Top.")); + writeAgent(join(dir, "review_agents"), "code-reviewer.md", makeAgentMd({ name: "code-reviewer" }, "Reviews code.")); + writeAgent(join(dir, "review_agents"), "security-reviewer.md", makeAgentMd({ name: "security-reviewer" }, "Reviews security.")); + writeAgent(join(dir, "build_agents"), "ts-builder.md", makeAgentMd({ name: "ts-builder", tools: "read,write,bash" }, "Builds TS.")); + + const { agents, collisions } = scanAgentDirectory(dir); + + assert.equal(agents.size, 4); + assert.ok(agents.has("top-level")); + assert.ok(agents.has("code-reviewer")); + assert.ok(agents.has("security-reviewer")); + assert.ok(agents.has("ts-builder")); + assert.equal(collisions.length, 0); + + rmSync(dir, { recursive: true }); + }); + + it("finds agents in deeply nested directories", () => { + const dir = tmpDir(); + const deep = join(dir, "a", "b", "c"); + mkdirSync(deep, { recursive: true }); + + writeAgent(deep, "deep-agent.md", makeAgentMd({ name: "deep-agent" }, "Deep.")); + + const { agents } = scanAgentDirectory(dir); + assert.equal(agents.size, 1); + assert.ok(agents.has("deep-agent")); + + rmSync(dir, { recursive: true }); + }); + + it("reports collisions for duplicate names across subdirs", () => { + const dir = tmpDir(); + mkdirSync(join(dir, "team-a"), { recursive: true }); + mkdirSync(join(dir, "team-b"), { recursive: true }); + + writeAgent(dir, "scout.md", makeAgentMd({ name: "scout" }, "Original.")); + writeAgent(join(dir, "team-a"), "scout-copy.md", makeAgentMd({ name: "scout" }, "Duplicate A.")); + writeAgent(join(dir, "team-b"), "another-scout.md", makeAgentMd({ name: "Scout" }, "Duplicate B.")); + + const { agents, collisions } = scanAgentDirectory(dir); + + // First wins + assert.equal(agents.size, 1); + assert.equal(agents.get("scout")!.systemPrompt, "Original."); + + // Two collisions reported + assert.equal(collisions.length, 2); + assert.ok(collisions.every((c) => c.name.toLowerCase() === "scout")); + // Each collision has the paths + for (const c of collisions) { + assert.ok(c.duplicatePath.length > 0); + assert.ok(c.originalPath.length > 0); + assert.notEqual(c.duplicatePath, c.originalPath); + } + + rmSync(dir, { recursive: true }); + }); + + it("returns zero collisions when all names are unique", () => { + const dir = tmpDir(); + mkdirSync(join(dir, "sub"), { recursive: true }); + + writeAgent(dir, "alpha.md", makeAgentMd({ name: "alpha" }, "A.")); + writeAgent(join(dir, "sub"), "beta.md", makeAgentMd({ name: "beta" }, "B.")); + + const { agents, collisions } = scanAgentDirectory(dir); + assert.equal(agents.size, 2); + assert.equal(collisions.length, 0); + + rmSync(dir, { recursive: true }); + }); + + it("skips .md files without valid frontmatter in subdirs", () => { + const dir = tmpDir(); + mkdirSync(join(dir, "docs"), { recursive: true }); + + writeAgent(dir, "valid.md", makeAgentMd({ name: "valid" }, "Valid.")); + writeAgent(join(dir, "docs"), "README.md", "# Just a readme\nNo frontmatter here."); + writeAgent(join(dir, "docs"), "CHANGELOG.md", "# Changes\n- stuff"); + + const { agents } = scanAgentDirectory(dir); + assert.equal(agents.size, 1); + assert.ok(agents.has("valid")); + + rmSync(dir, { recursive: true }); + }); + + it("mixes flat and nested agents correctly", () => { + const dir = tmpDir(); + mkdirSync(join(dir, "specialists"), { recursive: true }); + + writeAgent(dir, "scout.md", makeAgentMd({ name: "scout" }, "Flat scout.")); + writeAgent(dir, "builder.md", makeAgentMd({ name: "builder", tools: "read,write,edit,bash" }, "Flat builder.")); + writeAgent(join(dir, "specialists"), "reviewer.md", makeAgentMd({ name: "reviewer" }, "Nested reviewer.")); + writeAgent(join(dir, "specialists"), "documenter.md", makeAgentMd({ name: "documenter" }, "Nested documenter.")); + + const { agents, collisions } = scanAgentDirectory(dir); + assert.equal(agents.size, 4); + assert.equal(collisions.length, 0); + + // Verify all loaded + for (const name of ["scout", "builder", "reviewer", "documenter"]) { + assert.ok(agents.has(name), `expected agent "${name}" to be loaded`); + } + + rmSync(dir, { recursive: true }); + }); + + it("collision paths are absolute and point to real files", () => { + const dir = tmpDir(); + mkdirSync(join(dir, "sub"), { recursive: true }); + + const origPath = writeAgent(dir, "agent.md", makeAgentMd({ name: "dupe" }, "First.")); + const dupePath = writeAgent(join(dir, "sub"), "agent-copy.md", makeAgentMd({ name: "dupe" }, "Second.")); + + const { collisions } = scanAgentDirectory(dir); + assert.equal(collisions.length, 1); + assert.equal(collisions[0].name, "dupe"); + assert.ok(existsSync(collisions[0].originalPath), "original path should exist on disk"); + assert.ok(existsSync(collisions[0].duplicatePath), "duplicate path should exist on disk"); + + rmSync(dir, { recursive: true }); + }); +});