From 7e166bb8745a0f6583826ba7ef708ed38836a731 Mon Sep 17 00:00:00 2001
From: Andrei Gasparian <andrei.gasparian@jetbrains.com>
Date: Mon, 23 Mar 2026 19:14:45 +0100
Subject: [PATCH 1/6] [DBA-290] Refactor review skills to use sub-agent
 behavior

Extract review logic from SKILL.md into separate review-guidelines.md
files. Skills now act as thin orchestrators that spawn sub-agents with
clean context (no prior conversation state), ensuring unbiased reviews.

- local-code-review and review-architecture accept a scope parameter
- Both skills require explicit user approval before fixing findings
- Evals updated to assert sub-agent delegation pattern
- Smoke tests verify review-guidelines.md file existence
- CLAUDE.md workflow merges review steps and notes parallel execution

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .claude/skills/local-code-review/SKILL.md     | 146 ++++--------------
 .../skills/local-code-review/evals/evals.json |  45 +++---
 .../local-code-review/review-guidelines.md    | 133 ++++++++++++++++
 .claude/skills/review-architecture/SKILL.md   |  86 ++++-------
 .../review-architecture/evals/evals.json      |  37 +++--
 .../review-architecture/review-guidelines.md  |  69 +++++++++
 CLAUDE.md                                     |  16 +-
 scripts/smoke-test-skills.sh                  |   4 +
 8 files changed, 326 insertions(+), 210 deletions(-)
 create mode 100644 .claude/skills/local-code-review/review-guidelines.md
 create mode 100644 .claude/skills/review-architecture/review-guidelines.md
diff --git a/.claude/skills/local-code-review/SKILL.md b/.claude/skills/local-code-review/SKILL.md
index 21bbf99c..203a21f7 100644
--- a/.claude/skills/local-code-review/SKILL.md
+++ b/.claude/skills/local-code-review/SKILL.md
@@ -6,135 +6,53 @@ compatibility: git must be installed and rg (ripgrep) is recommended.
 
 # Local Code Review
 
-This skill is for review, not implementation. Default to read-only inspection. Do not edit files unless the user explicitly switches from review to fixing issues.
+This skill is for review, not implementation. Do not edit files unless the
+user explicitly switches from review to fixing issues.
 
-If the user didn't request an explicit review, ask the user if a review is wanted before doing it.
+If the user didn't request an explicit review, ask the user if a review is
+wanted before doing it.
 
-## Review Goal
+## Parameter
 
-Find the highest-signal problems in the current local changes:
+This skill accepts an optional **scope** argument:
 
-- correctness bugs
-- regressions in user-visible behavior
-- broken or inconsistent integration behavior
-- dependency, packaging, or plugin-loading risks
-- missing or misaligned tests
-- docs or help-text drift for user-visible changes
-- formatting or linting issues
+- `staged` — review only staged changes (`git diff --cached`)
+- `branch` — review the branch diff against main (default)
+- `files:<path>` — review specific files or directories (e.g. `files:src/databao_cli/mcp/`)
 
-Keep summaries brief.
+If no scope is provided, default to `branch`.
 
 ## Steps
-### 1. Scope Discovery
 
-Start by identifying what changed:
+### 1. Resolve scope
 
-1. Run `git status --short`.
-2. Inspect both staged and unstaged changes with `git diff --stat` and `git diff --cached --stat`.
-3. Read the actual diffs for changed files before reading large surrounding files.
-4. If the user asks for a branch review rather than just working tree changes, diff from the merge base with the main branch.
+Determine the review scope from the parameter (or default to `branch`).
+Prepare a short description of what will be reviewed.
 
-Prefer `rg`, `git diff`, and targeted file reads over broad scans.
+### 2. Spawn review sub-agent
 
-#### Databao Review Priorities
+Launch a sub-agent using the **Agent tool** with `subagent_type: "general-purpose"`.
 
-Pay extra attention to these repository-specific areas:
+Build the sub-agent prompt as follows:
 
-- CLI, API, or UI behavior
-- agent, executor, or model-provider wiring
-- MCP, plugin, or integration boundaries
-- configuration, build, or initialization flows
-- datasource, schema, or context-building logic
-- dependency, packaging, extras, and lockfile changes
-- test coverage for changed behavior
+1. Read the file `.claude/skills/local-code-review/review-guidelines.md` and
+   include its full content in the prompt.
+2. Tell the sub-agent the resolved scope (e.g. "Review the branch diff against
+   main", "Review staged changes", "Review files under src/databao_cli/mcp/").
+3. Include this preamble in the prompt:
 
-If a change touches one of those areas, review both the changed code and related tests.
+   > You are reviewing code changes for the Databao CLI project.
+   > You have NO prior context about why these changes were made — review
+   > purely on merit. Use your tools (Read, Grep, Glob, Bash) to inspect
+   > the code, diffs, and surrounding implementation. Do NOT edit any files.
 
-Use these targeted checks when the diff touches the corresponding area:
+### 3. Report findings
 
-- CLI, API, or UI behavior:
-  check defaults, help text, argument handling, request or response contracts, and user-visible output
-- agent, executor, or integration wiring:
-  check provider and model defaults, executor names, tool contracts, and consistency across callers
-- plugin, datasource, or build flows:
-  check configuration prompts, validation paths, plugin-loading expectations, and schema or context drift
-- packaging and dependencies:
-  check extras wiring, entrypoints, transitive dependency impact, lockfile drift, and docs/help drift
-- tests:
-  verify that changed behavior has corresponding assertions or call out the gap explicitly.
-  Make sure the tests cover some real logical path and are not only trivial assertions.
+Return the sub-agent's output to the user as-is. Do not summarize or filter
+the findings. If the sub-agent reports no findings, relay that clearly.
 
-### 2. Review Workflow
+### 4. Wait for user approval before fixing
 
-1. Establish the review scope from git.
-2. Read the diff carefully.
-3. Read the surrounding implementation for changed logic.
-4. Check related tests, identify where tests should have changed but did not.
-5. Evaluate if new tests should be added to cover added functionality.
-
-Good validation options:
-
-- the full test suite when it is practical for the repo and review scope
-- targeted test runs for modified areas when they give a faster or more relevant signal
-- non-mutating lint checks
-- non-mutating formatter checks
-- type checks
-- lockfile or dependency metadata validation when package definitions changed
-
-Before running validation, inspect the repo's local tooling configuration and use commands that actually exist there.
-
-Examples, when configured in the current repo:
-
-- `uv run pytest <targeted paths>`
-- `uv run ruff check <targeted paths>`
-- `uv run ruff format --check <targeted paths>`
-- `uv run mypy <targeted paths>`
-- `uv lock --check`
-
-Default to running the full test suite when it is practical and likely to add useful confidence. Use targeted tests instead when the diff is narrow and a focused run is the better fit.
-
-Avoid mutating validation in review mode:
-
-- do not run formatter or linter commands with `--fix`
-- do not run formatting commands without a check-only mode when one exists
-- do not run wrapper commands like `make check` or `pre-commit` unless you have verified they are non-mutating
-
-### 3. Findings
-
-A good finding should:
-
-- identify a concrete bug, regression, or meaningful risk
-- explain why it matters in real behavior
-- point to the exact file and line
-- mention the missing validation if that is part of the risk
-
-Avoid weak findings like stylistic opinions, speculative architecture preferences, or advice not grounded in the diff.
-
-#### Output Format
-
-Return findings first, ordered by severity.
-
-For each finding:
-
-- short severity label
-- concise title
-- why it is a problem
-- file and line reference
-- brief remediation direction when obvious
-
-Then include:
-
-- open questions or assumptions
-- a short note on testing gaps or validation performed
-
-If there are no findings, say that clearly and still mention any residual risk or untested surface.
-
-Format the results using Markdown.
-
-## Guardrails
-
-- Do not rewrite the code during review mode.
-- Do not bury findings under a long summary.
-- Do not claim tests passed unless you ran them.
-- Do not over-index on style when behavior risks exist.
-- Prefer explicit evidence from the diff and nearby code.
+Do NOT start resolving any findings automatically. Present the findings and
+**ask the user** which ones (if any) they want you to fix. Only proceed with
+code changes after explicit user approval.
diff --git a/.claude/skills/local-code-review/evals/evals.json b/.claude/skills/local-code-review/evals/evals.json
index 93e9b75e..00a687d3 100644
--- a/.claude/skills/local-code-review/evals/evals.json
+++ b/.claude/skills/local-code-review/evals/evals.json
@@ -3,47 +3,48 @@
   "evals": [
     {
       "id": 1,
-      "prompt": "Review the changes on my current branch before I open a PR. Keep this in review mode only and focus on correctness, regressions, and missing tests.",
-      "expected_output": "Agent determines branch review scope from git, inspects the actual diff, and returns a findings-first review or an explicit no-findings verdict with file references and a testing note.",
+      "prompt": "Review the changes on my current branch before I open a PR.",
+      "expected_output": "Agent resolves scope to branch, spawns a sub-agent with review-guidelines.md content and clean context, and returns findings from the sub-agent.",
       "assertions": [
-        "Agent determines scope from the current branch diff against the main branch or merge base",
-        "Agent inspects the actual git diff before broad repository scanning",
-        "Output presents findings first or explicitly states that no findings were found",
-        "Output references files and line numbers for any reported findings",
-        "Output includes a testing gap or validation-performed note"
+        "Agent reads .claude/skills/local-code-review/review-guidelines.md",
+        "Agent spawns a sub-agent using the Agent tool",
+        "Sub-agent prompt includes a preamble about having no prior context",
+        "Sub-agent prompt specifies the review scope (branch diff against main)",
+        "Agent returns the sub-agent findings to the user without filtering"
       ]
     },
     {
       "id": 2,
-      "prompt": "Please review my local changes before I commit. Do not fix anything. Focus on user-visible behavior, integration risks, and whether tests are missing.",
-      "expected_output": "Agent performs working-tree scope discovery from git, stays read-only, and returns severity-ordered findings tied to the local diff.",
+      "prompt": "/local-code-review staged",
+      "expected_output": "Agent resolves scope to staged changes and spawns a sub-agent that reviews only staged changes.",
       "assertions": [
-        "Agent runs `git status --short` to establish review scope",
-        "Agent inspects both `git diff --stat` and `git diff --cached --stat`",
-        "Agent reads the diff for changed files before reading large surrounding files",
-        "Output orders findings by severity and ties them to concrete behavior risks",
-        "Output includes open questions or assumptions"
+        "Agent resolves scope parameter to 'staged'",
+        "Agent reads .claude/skills/local-code-review/review-guidelines.md",
+        "Agent spawns a sub-agent using the Agent tool",
+        "Sub-agent prompt specifies reviewing staged changes (not branch diff)",
+        "Agent returns the sub-agent findings to the user"
       ]
     },
     {
       "id": 3,
       "prompt": "I changed a few files in this repo. What do you think?",
-      "expected_output": "Agent asks whether the user wants an explicit code review before beginning repository inspection.",
+      "expected_output": "Agent asks whether the user wants an explicit code review before spawning a sub-agent.",
       "assertions": [
         "Agent asks a clarifying question about whether a review is wanted",
-        "Agent does NOT start with git inspection or a full review before that clarification",
+        "Agent does NOT spawn a sub-agent or start a review before that clarification",
         "Agent keeps the response concise and framed as a clarification request"
       ]
     },
     {
       "id": 4,
-      "prompt": "Review my current branch and run any safe validation that makes sense, but do not mutate files.",
-      "expected_output": "Agent chooses only non-mutating validation that exists in this repo, then reports findings and validation results.",
+      "prompt": "/local-code-review files:src/databao_cli/mcp/",
+      "expected_output": "Agent resolves scope to the specified file path and spawns a sub-agent to review those files.",
       "assertions": [
-        "Agent inspects the repo's available tooling before choosing validation commands",
-        "If validation is run, agent uses non-mutating commands such as targeted `uv run pytest` or check-only linters",
-        "Agent does NOT run mutating commands such as formatter fixes or broad wrapper commands without confirming they are non-mutating",
-        "Output reports what validation was run or why validation was skipped"
+        "Agent resolves scope parameter to 'files:src/databao_cli/mcp/'",
+        "Agent reads .claude/skills/local-code-review/review-guidelines.md",
+        "Agent spawns a sub-agent using the Agent tool",
+        "Sub-agent prompt specifies reviewing files under src/databao_cli/mcp/",
+        "Agent returns the sub-agent findings to the user"
       ]
     }
   ]
diff --git a/.claude/skills/local-code-review/review-guidelines.md b/.claude/skills/local-code-review/review-guidelines.md
new file mode 100644
index 00000000..0f9b122f
--- /dev/null
+++ b/.claude/skills/local-code-review/review-guidelines.md
@@ -0,0 +1,133 @@
+# Code Review Guidelines
+
+## Review Goal
+
+Find the highest-signal problems in the changes under review:
+
+- correctness bugs
+- regressions in user-visible behavior
+- broken or inconsistent integration behavior
+- dependency, packaging, or plugin-loading risks
+- missing or misaligned tests
+- docs or help-text drift for user-visible changes
+- formatting or linting issues
+
+Keep summaries brief.
+
+## Steps
+
+### 1. Scope Discovery
+
+Start by identifying what changed:
+
+1. Run `git status --short`.
+2. Inspect the relevant changes based on the scope you were given:
+   - **branch**: diff from the merge base with the main branch
+   - **staged**: `git diff --cached --stat` and `git diff --cached`
+   - **files**: read the specified files and their recent git history
+3. Read the actual diffs for changed files before reading large surrounding files.
+
+Prefer `rg`, `git diff`, and targeted file reads over broad scans.
+
+### Databao Review Priorities
+
+Pay extra attention to these repository-specific areas:
+
+- CLI, API, or UI behavior
+- agent, executor, or model-provider wiring
+- MCP, plugin, or integration boundaries
+- configuration, build, or initialization flows
+- datasource, schema, or context-building logic
+- dependency, packaging, extras, and lockfile changes
+- test coverage for changed behavior
+
+If a change touches one of those areas, review both the changed code and related tests.
+
+Use these targeted checks when the diff touches the corresponding area:
+
+- CLI, API, or UI behavior:
+  check defaults, help text, argument handling, request or response contracts, and user-visible output
+- agent, executor, or integration wiring:
+  check provider and model defaults, executor names, tool contracts, and consistency across callers
+- plugin, datasource, or build flows:
+  check configuration prompts, validation paths, plugin-loading expectations, and schema or context drift
+- packaging and dependencies:
+  check extras wiring, entrypoints, transitive dependency impact, lockfile drift, and docs/help drift
+- tests:
+  verify that changed behavior has corresponding assertions or call out the gap explicitly.
+  Make sure the tests cover some real logical path and are not only trivial assertions.
+
+### 2. Review Workflow
+
+1. Establish the review scope from git.
+2. Read the diff carefully.
+3. Read the surrounding implementation for changed logic.
+4. Check related tests, identify where tests should have changed but did not.
+5. Evaluate if new tests should be added to cover added functionality.
+
+Good validation options:
+
+- the full test suite when it is practical for the repo and review scope
+- targeted test runs for modified areas when they give a faster or more relevant signal
+- non-mutating lint checks
+- non-mutating formatter checks
+- type checks
+- lockfile or dependency metadata validation when package definitions changed
+
+Before running validation, inspect the repo's local tooling configuration and use commands that actually exist there.
+
+Examples, when configured in the current repo:
+
+- `uv run pytest <targeted paths>`
+- `uv run ruff check <targeted paths>`
+- `uv run ruff format --check <targeted paths>`
+- `uv run mypy <targeted paths>`
+- `uv lock --check`
+
+Default to running the full test suite when it is practical and likely to add useful confidence. Use targeted tests instead when the diff is narrow and a focused run is the better fit.
+
+Avoid mutating validation in review mode:
+
+- do not run formatter or linter commands with `--fix`
+- do not run formatting commands without a check-only mode when one exists
+- do not run wrapper commands like `make check` or `pre-commit` unless you have verified they are non-mutating
+
+### 3. Findings
+
+A good finding should:
+
+- identify a concrete bug, regression, or meaningful risk
+- explain why it matters in real behavior
+- point to the exact file and line
+- mention the missing validation if that is part of the risk
+
+Avoid weak findings like stylistic opinions, speculative architecture preferences, or advice not grounded in the diff.
+
+#### Output Format
+
+Return findings first, ordered by severity.
+
+For each finding:
+
+- short severity label
+- concise title
+- why it is a problem
+- file and line reference
+- brief remediation direction when obvious
+
+Then include:
+
+- open questions or assumptions
+- a short note on testing gaps or validation performed
+
+If there are no findings, say that clearly and still mention any residual risk or untested surface.
+
+Format the results using Markdown.
+
+## Guardrails
+
+- Do not rewrite the code during review mode.
+- Do not bury findings under a long summary.
+- Do not claim tests passed unless you ran them.
+- Do not over-index on style when behavior risks exist.
+- Prefer explicit evidence from the diff and nearby code.
diff --git a/.claude/skills/review-architecture/SKILL.md b/.claude/skills/review-architecture/SKILL.md
index b9320b9e..9ab304ee 100755
--- a/.claude/skills/review-architecture/SKILL.md
+++ b/.claude/skills/review-architecture/SKILL.md
@@ -5,73 +5,51 @@ description: Review architecture quality, maintainability, and developer experie
 
 # Review Architecture
 
-Review architecture quality, maintainability, and developer experience (Dev UX)
-before or after significant changes.
+This skill reviews architecture quality, maintainability, and developer
+experience (Dev UX) before or after significant changes.
 
-## Primary sources of truth
+## Parameter
 
-Review in this order:
+This skill accepts an optional **scope** argument:
 
-1. `docs/architecture.md`
-2. `docs/python-coding-guidelines.md`
-3. `docs/testing-strategy.md`
-4. `CLAUDE.md`
-5. `README.md` (CLI usage and user-facing workflows)
+- `branch` — review architecture of code changed on the current branch (default)
+- `module:<path>` — review a specific module (e.g. `module:src/databao_cli/mcp/`)
+- `full` — review the full project architecture
 
-Then validate actual implementation under:
+If no scope is provided, default to `branch`.
 
-- `src/databao_cli/commands/`
-- `src/databao_cli/ui/`
-- `src/databao_cli/mcp/`
-- `src/databao_cli/project/`
+## Steps
 
-## Review goals
+### 1. Resolve scope
 
-- Confirm boundaries are clear and responsibilities are separated.
-- Confirm extension paths are low-friction (new command, MCP tool, UI page).
-- Confirm docs reflect real behavior and command paths.
-- Identify highest-impact improvements with minimal disruption.
+Determine the review scope from the parameter (or default to `branch`).
+Prepare a short description of what will be reviewed.
 
-## Architecture checklist
+### 2. Spawn review sub-agent
 
-- Are modules aligned with single responsibility?
-- Are CLI concerns separated from business logic?
-- Is the Click command structure clean and discoverable?
-- Are MCP tools properly isolated in `mcp/tools/`?
-- Are UI components reusable and page-specific logic separated?
-- Are errors actionable and surfaced at the right layer?
+Launch a sub-agent using the **Agent tool** with `subagent_type: "general-purpose"`.
 
-## Dev UX checklist
+Build the sub-agent prompt as follows:
 
-- Can a new contributor find the right entrypoints quickly?
-- Are `uv` / `make` commands obvious and consistent across docs/code?
-- Is local verification clear (`make check`, `make test`)?
-- Are defaults safe when environment/dependencies are missing?
-- Do naming and file layout reduce cognitive load?
-- Are common workflows discoverable in README + docs?
+1. Read the file `.claude/skills/review-architecture/review-guidelines.md` and
+   include its full content in the prompt.
+2. Tell the sub-agent the resolved scope (e.g. "Review the architecture of the
+   current branch changes", "Review the module src/databao_cli/mcp/",
+   "Review the full project architecture").
+3. Include this preamble in the prompt:
 
-## Output format
+   > You are reviewing the architecture of the Databao CLI project.
+   > You have NO prior context about why these changes were made — review
+   > purely on merit. Use your tools (Read, Grep, Glob, Bash) to inspect
+   > the code and project structure. Do NOT edit any files.
 
-Provide a concise report with these sections:
+### 3. Report findings
 
-1. **Current State**: 3-6 bullets on what is working well.
-2. **Risks / Gaps**: prioritized issues (High/Med/Low) with evidence.
-3. **Recommendations**: concrete changes, ordered by impact vs effort.
-4. **Doc Sync Needed**: exact files that should be updated.
-5. **Validation Plan**: minimal commands to verify proposed changes.
+Return the sub-agent's output to the user as-is. Do not summarize or filter
+the findings.
 
-## Recommendation style
+### 4. Wait for user approval before fixing
 
-- Prefer small, composable changes over sweeping rewrites.
-- Tie every recommendation to a specific pain point.
-- Include expected developer benefit (speed, clarity, reliability).
-- Flag trade-offs explicitly.
-- Separate immediate actions from longer-term improvements.
-
-## Guardrails
-
-- Do not invent architecture that conflicts with current code unless proposing it
-  explicitly as a future direction.
-- Do not request broad rewrites without clear ROI.
-- Keep proposals compatible with existing `uv` workflow and Click-based CLI.
-- Keep feedback actionable for the next pull request, not just aspirational.
+Do NOT start resolving any findings automatically. Present the findings and
+**ask the user** which ones (if any) they want you to fix. Only proceed with
+code changes after explicit user approval.
diff --git a/.claude/skills/review-architecture/evals/evals.json b/.claude/skills/review-architecture/evals/evals.json
index 1071ddc2..10311661 100644
--- a/.claude/skills/review-architecture/evals/evals.json
+++ b/.claude/skills/review-architecture/evals/evals.json
@@ -4,24 +4,37 @@
     {
       "id": 1,
       "prompt": "Review the architecture of the MCP server code.",
-      "expected_output": "A structured report covering the MCP module with Current State, Risks/Gaps, Recommendations, Doc Sync, and Validation Plan.",
+      "expected_output": "Agent resolves scope to the MCP module, spawns a sub-agent with review-guidelines.md content and clean context, and returns a structured architecture report.",
       "assertions": [
-        "Output contains a 'Current State' section with 3-6 bullets",
-        "Output contains a 'Risks / Gaps' section with severity levels (High/Med/Low)",
-        "Output contains a 'Recommendations' section with concrete changes",
-        "Output references docs/architecture.md as a source of truth",
-        "Output covers files under src/databao_cli/mcp/"
+        "Agent reads .claude/skills/review-architecture/review-guidelines.md",
+        "Agent spawns a sub-agent using the Agent tool",
+        "Sub-agent prompt includes a preamble about having no prior context",
+        "Sub-agent prompt specifies reviewing the MCP module architecture",
+        "Agent returns the sub-agent findings which include structured sections (Current State, Risks/Gaps, Recommendations)"
       ]
     },
     {
       "id": 2,
-      "prompt": "I'm about to add a new CLI command for managing API keys. Review the architecture to see if there are concerns.",
-      "expected_output": "A focused review of the command structure and extension patterns.",
+      "prompt": "/review-architecture module:src/databao_cli/commands/",
+      "expected_output": "Agent resolves scope to the commands module and spawns a sub-agent to review its architecture.",
       "assertions": [
-        "Output examines src/databao_cli/commands/ structure",
-        "Output references the Click command registration pattern in __main__.py",
-        "Output provides actionable recommendations for the new command",
-        "Output separates immediate actions from longer-term improvements"
+        "Agent resolves scope parameter to 'module:src/databao_cli/commands/'",
+        "Agent reads .claude/skills/review-architecture/review-guidelines.md",
+        "Agent spawns a sub-agent using the Agent tool",
+        "Sub-agent prompt specifies reviewing the commands module architecture",
+        "Agent returns the sub-agent findings to the user"
+      ]
+    },
+    {
+      "id": 3,
+      "prompt": "/review-architecture full",
+      "expected_output": "Agent resolves scope to full project and spawns a sub-agent for a comprehensive architecture review.",
+      "assertions": [
+        "Agent resolves scope parameter to 'full'",
+        "Agent reads .claude/skills/review-architecture/review-guidelines.md",
+        "Agent spawns a sub-agent using the Agent tool",
+        "Sub-agent prompt specifies reviewing the full project architecture",
+        "Agent returns the sub-agent findings to the user"
       ]
     }
   ]
diff --git a/.claude/skills/review-architecture/review-guidelines.md b/.claude/skills/review-architecture/review-guidelines.md
new file mode 100644
index 00000000..0f38e0e8
--- /dev/null
+++ b/.claude/skills/review-architecture/review-guidelines.md
@@ -0,0 +1,69 @@
+# Architecture Review Guidelines
+
+## Primary sources of truth
+
+Review in this order:
+
+1. `docs/architecture.md`
+2. `docs/python-coding-guidelines.md`
+3. `docs/testing-strategy.md`
+4. `CLAUDE.md`
+5. `README.md` (CLI usage and user-facing workflows)
+
+Then validate actual implementation under:
+
+- `src/databao_cli/commands/`
+- `src/databao_cli/ui/`
+- `src/databao_cli/mcp/`
+- `src/databao_cli/project/`
+
+## Review goals
+
+- Confirm boundaries are clear and responsibilities are separated.
+- Confirm extension paths are low-friction (new command, MCP tool, UI page).
+- Confirm docs reflect real behavior and command paths.
+- Identify highest-impact improvements with minimal disruption.
+
+## Architecture checklist
+
+- Are modules aligned with single responsibility?
+- Are CLI concerns separated from business logic?
+- Is the Click command structure clean and discoverable?
+- Are MCP tools properly isolated in `mcp/tools/`?
+- Are UI components reusable and page-specific logic separated?
+- Are errors actionable and surfaced at the right layer?
+
+## Dev UX checklist
+
+- Can a new contributor find the right entrypoints quickly?
+- Are `uv` / `make` commands obvious and consistent across docs/code?
+- Is local verification clear (`make check`, `make test`)?
+- Are defaults safe when environment/dependencies are missing?
+- Do naming and file layout reduce cognitive load?
+- Are common workflows discoverable in README + docs?
+
+## Output format
+
+Provide a concise report with these sections:
+
+1. **Current State**: 3-6 bullets on what is working well.
+2. **Risks / Gaps**: prioritized issues (High/Med/Low) with evidence.
+3. **Recommendations**: concrete changes, ordered by impact vs effort.
+4. **Doc Sync Needed**: exact files that should be updated.
+5. **Validation Plan**: minimal commands to verify proposed changes.
+
+## Recommendation style
+
+- Prefer small, composable changes over sweeping rewrites.
+- Tie every recommendation to a specific pain point.
+- Include expected developer benefit (speed, clarity, reliability).
+- Flag trade-offs explicitly.
+- Separate immediate actions from longer-term improvements.
+
+## Guardrails
+
+- Do not invent architecture that conflicts with current code unless proposing it
+  explicitly as a future direction.
+- Do not request broad rewrites without clear ROI.
+- Keep proposals compatible with existing `uv` workflow and Click-based CLI.
+- Keep feedback actionable for the next pull request, not just aspirational.
diff --git a/CLAUDE.md b/CLAUDE.md
index 669784f3..1562216a 100755
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -118,13 +118,13 @@ style issues; do not manually fix formatting.
    run `make test` to verify they pass.
 3. **Test & lint** — run `make check` then `make test-cov-check`. Fix any
    failures before proceeding.
-4. **Review the code** — review the code locally. Fix any high-severity
-   findings before proceeding.
-5. **Architecture review** — review architecture quality of the changed
-   code. Fix any high-severity issues before proceeding.
-6. **Branch** — use the `create-branch` skill.
-7. **Commit & PR** — use the `create-pr` skill (stages, commits, pauses
+4. **Review** — run `local-code-review` and `review-architecture` skills.
+   Both spawn sub-agents with clean context (no prior conversation state)
+   and can run **in parallel**. Fix any high-severity findings before
+   proceeding.
+5. **Branch** — use the `create-branch` skill.
+6. **Commit & PR** — use the `create-pr` skill (stages, commits, pauses
    for confirmation, pushes, and opens the PR).
-8. **Update YouTrack** — move the ticket to **Review** state and add
+7. **Update YouTrack** — move the ticket to **Review** state and add
    a comment with the PR URL.
-9. Never commit directly to `main`.
+8. Never commit directly to `main`.
diff --git a/scripts/smoke-test-skills.sh b/scripts/smoke-test-skills.sh
index 0f2acd99..ecc78d5b 100755
--- a/scripts/smoke-test-skills.sh
+++ b/scripts/smoke-test-skills.sh
@@ -35,6 +35,10 @@ run_test "setup-environment: make setup" "make setup"
 # check-coverage: make test-cov-check succeeds
 run_test "check-coverage: make test-cov-check" "make test-cov-check"
 
+# review skills: review-guidelines.md files exist and are non-empty
+run_test "local-code-review: review-guidelines.md exists" "test -s .claude/skills/local-code-review/review-guidelines.md"
+run_test "review-architecture: review-guidelines.md exists" "test -s .claude/skills/review-architecture/review-guidelines.md"
+
 # review-architecture: referenced doc files exist and are non-empty
 for doc in docs/architecture.md docs/python-coding-guidelines.md docs/testing-strategy.md README.md; do
   run_test "review-architecture: $doc exists and non-empty" "test -s \"$doc\""

From bb6a88efcdabf64ac055dc8c8c25ace02e3c4471 Mon Sep 17 00:00:00 2001
From: Andrei Gasparian <andrei.gasparian@jetbrains.com>
Date: Mon, 23 Mar 2026 19:17:28 +0100
Subject: [PATCH 2/6] [DBA-290] Fix flaky duckdb test by using explicit
 connection

Both test_build and test_index called duckdb.execute() on the shared
default in-memory connection instead of the file-scoped connection,
causing "table t1 already exists" when both tests ran in the same suite.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 tests/test_build.py | 5 +++--
 tests/test_index.py | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/tests/test_build.py b/tests/test_build.py
index 3067e7b5..a624950c 100644
--- a/tests/test_build.py
+++ b/tests/test_build.py
@@ -8,8 +8,9 @@
 
 def test_databao_build_duckdb_datasource(project_layout: ProjectLayout, tmp_path_factory: TempPathFactory) -> None:
     test_db = tmp_path_factory.mktemp("duckdb") / "test_db.duckdb"
-    duckdb.connect(str(test_db))
-    duckdb.execute("CREATE TABLE t1 AS SELECT 1 AS i, 2 AS j;")
+    conn = duckdb.connect(str(test_db))
+    conn.execute("CREATE TABLE t1 AS SELECT 1 AS i, 2 AS j;")
+    conn.close()
 
     dce_domain_manager = DatabaoContextDomainManager(domain_dir=project_layout.root_domain_dir)
     dce_domain_manager.create_datasource_config(
diff --git a/tests/test_index.py b/tests/test_index.py
index 2d2a3310..8399f0d5 100644
--- a/tests/test_index.py
+++ b/tests/test_index.py
@@ -8,8 +8,9 @@
 
 def test_databao_index_duckdb_datasource(project_layout: ProjectLayout, tmp_path_factory: TempPathFactory) -> None:
     test_db = tmp_path_factory.mktemp("duckdb") / "test_db.duckdb"
-    duckdb.connect(str(test_db))
-    duckdb.execute("CREATE TABLE t1 AS SELECT 1 AS i, 2 AS j;")
+    conn = duckdb.connect(str(test_db))
+    conn.execute("CREATE TABLE t1 AS SELECT 1 AS i, 2 AS j;")
+    conn.close()
 
     dce_domain_manager = DatabaoContextDomainManager(domain_dir=project_layout.root_domain_dir)
     dce_domain_manager.create_datasource_config(

From 408a6d1041e037e679e5e75cc8786be8d01cf2b6 Mon Sep 17 00:00:00 2001
From: Andrei Gasparian <andrei.gasparian@jetbrains.com>
Date: Mon, 23 Mar 2026 19:25:10 +0100
Subject: [PATCH 3/6] [DBA-290] Use native context:fork for review skill
 isolation

Replace manual sub-agent orchestration with Claude Code's native
context:fork + agent frontmatter fields. Skills now run directly in
a forked sub-agent context, eliminating the need for separate
review-guidelines.md files and orchestrator logic.

- Add argument-hint, context:fork, agent:general-purpose to frontmatter
- Use $ARGUMENTS for scope parameter substitution
- Merge review guidelines back into SKILL.md (it IS the sub-agent now)
- Remove review-guidelines.md files (no longer needed)
- Move "ask before fixing" guardrail to CLAUDE.md workflow
- Update evals to test review quality directly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .claude/skills/local-code-review/SKILL.md     | 160 ++++++++++++++----
 .../skills/local-code-review/evals/evals.json |  50 +++---
 .../local-code-review/review-guidelines.md    | 133 ---------------
 .claude/skills/review-architecture/SKILL.md   |  97 +++++++----
 .../review-architecture/evals/evals.json      |  36 ++--
 .../review-architecture/review-guidelines.md  |  69 --------
 CLAUDE.md                                     |   6 +-
 scripts/smoke-test-skills.sh                  |   4 -
 8 files changed, 243 insertions(+), 312 deletions(-)
 delete mode 100644 .claude/skills/local-code-review/review-guidelines.md
 delete mode 100644 .claude/skills/review-architecture/review-guidelines.md

diff --git a/.claude/skills/local-code-review/SKILL.md b/.claude/skills/local-code-review/SKILL.md
index 203a21f7..3c207b48 100644
--- a/.claude/skills/local-code-review/SKILL.md
+++ b/.claude/skills/local-code-review/SKILL.md
@@ -1,58 +1,158 @@
 ---
 name: local-code-review
 description: Review local code changes in Databao repositories before a commit or PR. Use when the user wants a review of staged or unstaged diffs, local branches, or pre-merge changes. Focus on correctness, regressions, missing tests, API/CLI behavior changes, executor or tooling changes, dependency or plugin-loading risks, and user-visible behavior changes.
-compatibility: git must be installed and rg (ripgrep) is recommended.
+argument-hint: "[scope: staged | branch | files:<path>]"
+context: fork
+agent: general-purpose
 ---
 
 # Local Code Review
 
-This skill is for review, not implementation. Do not edit files unless the
-user explicitly switches from review to fixing issues.
+You are reviewing code changes for the Databao CLI project.
+You have NO prior context about why these changes were made — review
+purely on merit. Use your tools (Read, Grep, Glob, Bash) to inspect
+the code, diffs, and surrounding implementation. Do NOT edit any files.
 
-If the user didn't request an explicit review, ask the user if a review is
-wanted before doing it.
+## Scope
 
-## Parameter
+Review scope: $ARGUMENTS
 
-This skill accepts an optional **scope** argument:
+If no scope was provided, default to `branch`.
+
+Accepted scopes:
 
 - `staged` — review only staged changes (`git diff --cached`)
 - `branch` — review the branch diff against main (default)
 - `files:<path>` — review specific files or directories (e.g. `files:src/databao_cli/mcp/`)
 
-If no scope is provided, default to `branch`.
+## Review Goal
+
+Find the highest-signal problems in the changes under review:
+
+- correctness bugs
+- regressions in user-visible behavior
+- broken or inconsistent integration behavior
+- dependency, packaging, or plugin-loading risks
+- missing or misaligned tests
+- docs or help-text drift for user-visible changes
+- formatting or linting issues
+
+Keep summaries brief.
 
 ## Steps
 
-### 1. Resolve scope
+### 1. Scope Discovery
+
+Start by identifying what changed:
+
+1. Run `git status --short`.
+2. Inspect the relevant changes based on the scope you were given:
+   - **branch**: diff from the merge base with the main branch
+   - **staged**: `git diff --cached --stat` and `git diff --cached`
+   - **files**: read the specified files and their recent git history
+3. Read the actual diffs for changed files before reading large surrounding files.
+
+Prefer `rg`, `git diff`, and targeted file reads over broad scans.
+
+### Databao Review Priorities
+
+Pay extra attention to these repository-specific areas:
+
+- CLI, API, or UI behavior
+- agent, executor, or model-provider wiring
+- MCP, plugin, or integration boundaries
+- configuration, build, or initialization flows
+- datasource, schema, or context-building logic
+- dependency, packaging, extras, and lockfile changes
+- test coverage for changed behavior
+
+If a change touches one of those areas, review both the changed code and related tests.
+
+Use these targeted checks when the diff touches the corresponding area:
+
+- CLI, API, or UI behavior:
+  check defaults, help text, argument handling, request or response contracts, and user-visible output
+- agent, executor, or integration wiring:
+  check provider and model defaults, executor names, tool contracts, and consistency across callers
+- plugin, datasource, or build flows:
+  check configuration prompts, validation paths, plugin-loading expectations, and schema or context drift
+- packaging and dependencies:
+  check extras wiring, entrypoints, transitive dependency impact, lockfile drift, and docs/help drift
+- tests:
+  verify that changed behavior has corresponding assertions or call out the gap explicitly.
+  Make sure the tests cover some real logical path and are not only trivial assertions.
+
+### 2. Review Workflow
+
+1. Establish the review scope from git.
+2. Read the diff carefully.
+3. Read the surrounding implementation for changed logic.
+4. Check related tests, identify where tests should have changed but did not.
+5. Evaluate if new tests should be added to cover added functionality.
+
+Good validation options:
+
+- the full test suite when it is practical for the repo and review scope
+- targeted test runs for modified areas when they give a faster or more relevant signal
+- non-mutating lint checks
+- non-mutating formatter checks
+- type checks
+- lockfile or dependency metadata validation when package definitions changed
+
+Before running validation, inspect the repo's local tooling configuration and use commands that actually exist there.
+
+Examples, when configured in the current repo:
+
+- `uv run pytest <targeted paths>`
+- `uv run ruff check <targeted paths>`
+- `uv run ruff format --check <targeted paths>`
+- `uv run mypy <targeted paths>`
+- `uv lock --check`
+
+Default to running the full test suite when it is practical and likely to add useful confidence. Use targeted tests instead when the diff is narrow and a focused run is the better fit.
+
+Avoid mutating validation in review mode:
+
+- do not run formatter or linter commands with `--fix`
+- do not run formatting commands without a check-only mode when one exists
+- do not run wrapper commands like `make check` or `pre-commit` unless you have verified they are non-mutating
+
+### 3. Findings
+
+A good finding should:
+
+- identify a concrete bug, regression, or meaningful risk
+- explain why it matters in real behavior
+- point to the exact file and line
+- mention the missing validation if that is part of the risk
+
+Avoid weak findings like stylistic opinions, speculative architecture preferences, or advice not grounded in the diff.
 
-Determine the review scope from the parameter (or default to `branch`).
-Prepare a short description of what will be reviewed.
+#### Output Format
 
-### 2. Spawn review sub-agent
+Return findings first, ordered by severity.
 
-Launch a sub-agent using the **Agent tool** with `subagent_type: "general-purpose"`.
+For each finding:
 
-Build the sub-agent prompt as follows:
+- short severity label
+- concise title
+- why it is a problem
+- file and line reference
+- brief remediation direction when obvious
 
-1. Read the file `.claude/skills/local-code-review/review-guidelines.md` and
-   include its full content in the prompt.
-2. Tell the sub-agent the resolved scope (e.g. "Review the branch diff against
-   main", "Review staged changes", "Review files under src/databao_cli/mcp/").
-3. Include this preamble in the prompt:
+Then include:
 
-   > You are reviewing code changes for the Databao CLI project.
-   > You have NO prior context about why these changes were made — review
-   > purely on merit. Use your tools (Read, Grep, Glob, Bash) to inspect
-   > the code, diffs, and surrounding implementation. Do NOT edit any files.
+- open questions or assumptions
+- a short note on testing gaps or validation performed
 
-### 3. Report findings
+If there are no findings, say that clearly and still mention any residual risk or untested surface.
 
-Return the sub-agent's output to the user as-is. Do not summarize or filter
-the findings. If the sub-agent reports no findings, relay that clearly.
+Format the results using Markdown.
 
-### 4. Wait for user approval before fixing
+## Guardrails
 
-Do NOT start resolving any findings automatically. Present the findings and
-**ask the user** which ones (if any) they want you to fix. Only proceed with
-code changes after explicit user approval.
+- Do not rewrite the code during review mode.
+- Do not bury findings under a long summary.
+- Do not claim tests passed unless you ran them.
+- Do not over-index on style when behavior risks exist.
+- Prefer explicit evidence from the diff and nearby code.
diff --git a/.claude/skills/local-code-review/evals/evals.json b/.claude/skills/local-code-review/evals/evals.json
index 00a687d3..232c4057 100644
--- a/.claude/skills/local-code-review/evals/evals.json
+++ b/.claude/skills/local-code-review/evals/evals.json
@@ -3,48 +3,48 @@
   "evals": [
     {
       "id": 1,
-      "prompt": "Review the changes on my current branch before I open a PR.",
-      "expected_output": "Agent resolves scope to branch, spawns a sub-agent with review-guidelines.md content and clean context, and returns findings from the sub-agent.",
+      "prompt": "/local-code-review branch",
+      "expected_output": "Agent determines branch review scope from git, inspects the actual diff, and returns a findings-first review or an explicit no-findings verdict with file references and a testing note.",
       "assertions": [
-        "Agent reads .claude/skills/local-code-review/review-guidelines.md",
-        "Agent spawns a sub-agent using the Agent tool",
-        "Sub-agent prompt includes a preamble about having no prior context",
-        "Sub-agent prompt specifies the review scope (branch diff against main)",
-        "Agent returns the sub-agent findings to the user without filtering"
+        "Agent determines scope from the current branch diff against the main branch or merge base",
+        "Agent inspects the actual git diff before broad repository scanning",
+        "Output presents findings first or explicitly states that no findings were found",
+        "Output references files and line numbers for any reported findings",
+        "Output includes a testing gap or validation-performed note"
       ]
     },
     {
       "id": 2,
       "prompt": "/local-code-review staged",
-      "expected_output": "Agent resolves scope to staged changes and spawns a sub-agent that reviews only staged changes.",
+      "expected_output": "Agent performs staged-changes scope discovery from git, stays read-only, and returns severity-ordered findings tied to the staged diff.",
       "assertions": [
-        "Agent resolves scope parameter to 'staged'",
-        "Agent reads .claude/skills/local-code-review/review-guidelines.md",
-        "Agent spawns a sub-agent using the Agent tool",
-        "Sub-agent prompt specifies reviewing staged changes (not branch diff)",
-        "Agent returns the sub-agent findings to the user"
+        "Agent inspects staged changes via git diff --cached",
+        "Agent reads the diff for changed files before reading large surrounding files",
+        "Output orders findings by severity and ties them to concrete behavior risks",
+        "Output includes open questions or assumptions",
+        "Agent does NOT edit any files"
       ]
     },
     {
       "id": 3,
-      "prompt": "I changed a few files in this repo. What do you think?",
-      "expected_output": "Agent asks whether the user wants an explicit code review before spawning a sub-agent.",
+      "prompt": "/local-code-review files:src/databao_cli/mcp/",
+      "expected_output": "Agent reviews the specified files/directory, reading them and their recent git history.",
       "assertions": [
-        "Agent asks a clarifying question about whether a review is wanted",
-        "Agent does NOT spawn a sub-agent or start a review before that clarification",
-        "Agent keeps the response concise and framed as a clarification request"
+        "Agent focuses review on files under src/databao_cli/mcp/",
+        "Agent reads the actual file contents or diffs for the specified path",
+        "Output presents findings with file and line references",
+        "Agent does NOT edit any files"
       ]
     },
     {
       "id": 4,
-      "prompt": "/local-code-review files:src/databao_cli/mcp/",
-      "expected_output": "Agent resolves scope to the specified file path and spawns a sub-agent to review those files.",
+      "prompt": "/local-code-review",
+      "expected_output": "Agent defaults to branch scope when no argument is provided and runs a branch review.",
       "assertions": [
-        "Agent resolves scope parameter to 'files:src/databao_cli/mcp/'",
-        "Agent reads .claude/skills/local-code-review/review-guidelines.md",
-        "Agent spawns a sub-agent using the Agent tool",
-        "Sub-agent prompt specifies reviewing files under src/databao_cli/mcp/",
-        "Agent returns the sub-agent findings to the user"
+        "Agent defaults to reviewing the branch diff against main",
+        "Agent inspects the repo's available tooling before choosing validation commands",
+        "If validation is run, agent uses non-mutating commands",
+        "Agent does NOT run mutating commands such as formatter fixes"
       ]
     }
   ]
diff --git a/.claude/skills/local-code-review/review-guidelines.md b/.claude/skills/local-code-review/review-guidelines.md
deleted file mode 100644
index 0f9b122f..00000000
--- a/.claude/skills/local-code-review/review-guidelines.md
+++ /dev/null
@@ -1,133 +0,0 @@
-# Code Review Guidelines
-
-## Review Goal
-
-Find the highest-signal problems in the changes under review:
-
-- correctness bugs
-- regressions in user-visible behavior
-- broken or inconsistent integration behavior
-- dependency, packaging, or plugin-loading risks
-- missing or misaligned tests
-- docs or help-text drift for user-visible changes
-- formatting or linting issues
-
-Keep summaries brief.
-
-## Steps
-
-### 1. Scope Discovery
-
-Start by identifying what changed:
-
-1. Run `git status --short`.
-2. Inspect the relevant changes based on the scope you were given:
-   - **branch**: diff from the merge base with the main branch
-   - **staged**: `git diff --cached --stat` and `git diff --cached`
-   - **files**: read the specified files and their recent git history
-3. Read the actual diffs for changed files before reading large surrounding files.
-
-Prefer `rg`, `git diff`, and targeted file reads over broad scans.
-
-### Databao Review Priorities
-
-Pay extra attention to these repository-specific areas:
-
-- CLI, API, or UI behavior
-- agent, executor, or model-provider wiring
-- MCP, plugin, or integration boundaries
-- configuration, build, or initialization flows
-- datasource, schema, or context-building logic
-- dependency, packaging, extras, and lockfile changes
-- test coverage for changed behavior
-
-If a change touches one of those areas, review both the changed code and related tests.
-
-Use these targeted checks when the diff touches the corresponding area:
-
-- CLI, API, or UI behavior:
-  check defaults, help text, argument handling, request or response contracts, and user-visible output
-- agent, executor, or integration wiring:
-  check provider and model defaults, executor names, tool contracts, and consistency across callers
-- plugin, datasource, or build flows:
-  check configuration prompts, validation paths, plugin-loading expectations, and schema or context drift
-- packaging and dependencies:
-  check extras wiring, entrypoints, transitive dependency impact, lockfile drift, and docs/help drift
-- tests:
-  verify that changed behavior has corresponding assertions or call out the gap explicitly.
-  Make sure the tests cover some real logical path and are not only trivial assertions.
-
-### 2. Review Workflow
-
-1. Establish the review scope from git.
-2. Read the diff carefully.
-3. Read the surrounding implementation for changed logic.
-4. Check related tests, identify where tests should have changed but did not.
-5. Evaluate if new tests should be added to cover added functionality.
-
-Good validation options:
-
-- the full test suite when it is practical for the repo and review scope
-- targeted test runs for modified areas when they give a faster or more relevant signal
-- non-mutating lint checks
-- non-mutating formatter checks
-- type checks
-- lockfile or dependency metadata validation when package definitions changed
-
-Before running validation, inspect the repo's local tooling configuration and use commands that actually exist there.
-
-Examples, when configured in the current repo:
-
-- `uv run pytest <targeted paths>`
-- `uv run ruff check <targeted paths>`
-- `uv run ruff format --check <targeted paths>`
-- `uv run mypy <targeted paths>`
-- `uv lock --check`
-
-Default to running the full test suite when it is practical and likely to add useful confidence. Use targeted tests instead when the diff is narrow and a focused run is the better fit.
-
-Avoid mutating validation in review mode:
-
-- do not run formatter or linter commands with `--fix`
-- do not run formatting commands without a check-only mode when one exists
-- do not run wrapper commands like `make check` or `pre-commit` unless you have verified they are non-mutating
-
-### 3. Findings
-
-A good finding should:
-
-- identify a concrete bug, regression, or meaningful risk
-- explain why it matters in real behavior
-- point to the exact file and line
-- mention the missing validation if that is part of the risk
-
-Avoid weak findings like stylistic opinions, speculative architecture preferences, or advice not grounded in the diff.
-
-#### Output Format
-
-Return findings first, ordered by severity.
-
-For each finding:
-
-- short severity label
-- concise title
-- why it is a problem
-- file and line reference
-- brief remediation direction when obvious
-
-Then include:
-
-- open questions or assumptions
-- a short note on testing gaps or validation performed
-
-If there are no findings, say that clearly and still mention any residual risk or untested surface.
-
-Format the results using Markdown.
-
-## Guardrails
-
-- Do not rewrite the code during review mode.
-- Do not bury findings under a long summary.
-- Do not claim tests passed unless you ran them.
-- Do not over-index on style when behavior risks exist.
-- Prefer explicit evidence from the diff and nearby code.
diff --git a/.claude/skills/review-architecture/SKILL.md b/.claude/skills/review-architecture/SKILL.md
index 9ab304ee..dfdcc7a6 100755
--- a/.claude/skills/review-architecture/SKILL.md
+++ b/.claude/skills/review-architecture/SKILL.md
@@ -1,55 +1,94 @@
 ---
 name: review-architecture
 description: Review architecture quality, maintainability, and developer experience before or after significant changes. Use when introducing a new CLI command or MCP tool, refactoring core module boundaries, diagnosing repeated dev friction, or preparing a PR with broad structural impact.
+argument-hint: "[scope: branch | module:<path> | full]"
+context: fork
+agent: general-purpose
 ---
 
 # Review Architecture
 
-This skill reviews architecture quality, maintainability, and developer
-experience (Dev UX) before or after significant changes.
+You are reviewing the architecture of the Databao CLI project.
+You have NO prior context about why these changes were made — review
+purely on merit. Use your tools (Read, Grep, Glob, Bash) to inspect
+the code and project structure. Do NOT edit any files.
 
-## Parameter
+## Scope
 
-This skill accepts an optional **scope** argument:
+Review scope: $ARGUMENTS
+
+If no scope was provided, default to `branch`.
+
+Accepted scopes:
 
 - `branch` — review architecture of code changed on the current branch (default)
 - `module:<path>` — review a specific module (e.g. `module:src/databao_cli/mcp/`)
 - `full` — review the full project architecture
 
-If no scope is provided, default to `branch`.
+## Primary sources of truth
+
+Review in this order:
+
+1. `docs/architecture.md`
+2. `docs/python-coding-guidelines.md`
+3. `docs/testing-strategy.md`
+4. `CLAUDE.md`
+5. `README.md` (CLI usage and user-facing workflows)
+
+Then validate actual implementation under:
+
+- `src/databao_cli/commands/`
+- `src/databao_cli/ui/`
+- `src/databao_cli/mcp/`
+- `src/databao_cli/project/`
+
+## Review goals
 
-## Steps
+- Confirm boundaries are clear and responsibilities are separated.
+- Confirm extension paths are low-friction (new command, MCP tool, UI page).
+- Confirm docs reflect real behavior and command paths.
+- Identify highest-impact improvements with minimal disruption.
 
-### 1. Resolve scope
+## Architecture checklist
 
-Determine the review scope from the parameter (or default to `branch`).
-Prepare a short description of what will be reviewed.
+- Are modules aligned with single responsibility?
+- Are CLI concerns separated from business logic?
+- Is the Click command structure clean and discoverable?
+- Are MCP tools properly isolated in `mcp/tools/`?
+- Are UI components reusable and page-specific logic separated?
+- Are errors actionable and surfaced at the right layer?
 
-### 2. Spawn review sub-agent
+## Dev UX checklist
 
-Launch a sub-agent using the **Agent tool** with `subagent_type: "general-purpose"`.
+- Can a new contributor find the right entrypoints quickly?
+- Are `uv` / `make` commands obvious and consistent across docs/code?
+- Is local verification clear (`make check`, `make test`)?
+- Are defaults safe when environment/dependencies are missing?
+- Do naming and file layout reduce cognitive load?
+- Are common workflows discoverable in README + docs?
 
-Build the sub-agent prompt as follows:
+## Output format
 
-1. Read the file `.claude/skills/review-architecture/review-guidelines.md` and
-   include its full content in the prompt.
-2. Tell the sub-agent the resolved scope (e.g. "Review the architecture of the
-   current branch changes", "Review the module src/databao_cli/mcp/",
-   "Review the full project architecture").
-3. Include this preamble in the prompt:
+Provide a concise report with these sections:
 
-   > You are reviewing the architecture of the Databao CLI project.
-   > You have NO prior context about why these changes were made — review
-   > purely on merit. Use your tools (Read, Grep, Glob, Bash) to inspect
-   > the code and project structure. Do NOT edit any files.
+1. **Current State**: 3-6 bullets on what is working well.
+2. **Risks / Gaps**: prioritized issues (High/Med/Low) with evidence.
+3. **Recommendations**: concrete changes, ordered by impact vs effort.
+4. **Doc Sync Needed**: exact files that should be updated.
+5. **Validation Plan**: minimal commands to verify proposed changes.
 
-### 3. Report findings
+## Recommendation style
 
-Return the sub-agent's output to the user as-is. Do not summarize or filter
-the findings.
+- Prefer small, composable changes over sweeping rewrites.
+- Tie every recommendation to a specific pain point.
+- Include expected developer benefit (speed, clarity, reliability).
+- Flag trade-offs explicitly.
+- Separate immediate actions from longer-term improvements.
 
-### 4. Wait for user approval before fixing
+## Guardrails
 
-Do NOT start resolving any findings automatically. Present the findings and
-**ask the user** which ones (if any) they want you to fix. Only proceed with
-code changes after explicit user approval.
+- Do not invent architecture that conflicts with current code unless proposing it
+  explicitly as a future direction.
+- Do not request broad rewrites without clear ROI.
+- Keep proposals compatible with existing `uv` workflow and Click-based CLI.
+- Keep feedback actionable for the next pull request, not just aspirational.
diff --git a/.claude/skills/review-architecture/evals/evals.json b/.claude/skills/review-architecture/evals/evals.json
index 10311661..d022bd5e 100644
--- a/.claude/skills/review-architecture/evals/evals.json
+++ b/.claude/skills/review-architecture/evals/evals.json
@@ -3,38 +3,36 @@
   "evals": [
     {
       "id": 1,
-      "prompt": "Review the architecture of the MCP server code.",
-      "expected_output": "Agent resolves scope to the MCP module, spawns a sub-agent with review-guidelines.md content and clean context, and returns a structured architecture report.",
+      "prompt": "/review-architecture module:src/databao_cli/mcp/",
+      "expected_output": "A structured report covering the MCP module with Current State, Risks/Gaps, Recommendations, Doc Sync, and Validation Plan.",
       "assertions": [
-        "Agent reads .claude/skills/review-architecture/review-guidelines.md",
-        "Agent spawns a sub-agent using the Agent tool",
-        "Sub-agent prompt includes a preamble about having no prior context",
-        "Sub-agent prompt specifies reviewing the MCP module architecture",
-        "Agent returns the sub-agent findings which include structured sections (Current State, Risks/Gaps, Recommendations)"
+        "Output contains a 'Current State' section with 3-6 bullets",
+        "Output contains a 'Risks / Gaps' section with severity levels (High/Med/Low)",
+        "Output contains a 'Recommendations' section with concrete changes",
+        "Output references docs/architecture.md as a source of truth",
+        "Output covers files under src/databao_cli/mcp/"
       ]
     },
     {
       "id": 2,
       "prompt": "/review-architecture module:src/databao_cli/commands/",
-      "expected_output": "Agent resolves scope to the commands module and spawns a sub-agent to review its architecture.",
+      "expected_output": "A focused review of the command structure and extension patterns.",
       "assertions": [
-        "Agent resolves scope parameter to 'module:src/databao_cli/commands/'",
-        "Agent reads .claude/skills/review-architecture/review-guidelines.md",
-        "Agent spawns a sub-agent using the Agent tool",
-        "Sub-agent prompt specifies reviewing the commands module architecture",
-        "Agent returns the sub-agent findings to the user"
+        "Output examines src/databao_cli/commands/ structure",
+        "Output references the Click command registration pattern",
+        "Output provides actionable recommendations",
+        "Output separates immediate actions from longer-term improvements"
       ]
     },
     {
       "id": 3,
       "prompt": "/review-architecture full",
-      "expected_output": "Agent resolves scope to full project and spawns a sub-agent for a comprehensive architecture review.",
+      "expected_output": "A comprehensive architecture review covering all major modules.",
       "assertions": [
-        "Agent resolves scope parameter to 'full'",
-        "Agent reads .claude/skills/review-architecture/review-guidelines.md",
-        "Agent spawns a sub-agent using the Agent tool",
-        "Sub-agent prompt specifies reviewing the full project architecture",
-        "Agent returns the sub-agent findings to the user"
+        "Output reviews multiple modules (commands, mcp, ui, project)",
+        "Output references primary sources of truth (docs/architecture.md, CLAUDE.md)",
+        "Output contains all five report sections (Current State, Risks/Gaps, Recommendations, Doc Sync, Validation Plan)",
+        "Agent does NOT edit any files"
       ]
     }
   ]
diff --git a/.claude/skills/review-architecture/review-guidelines.md b/.claude/skills/review-architecture/review-guidelines.md
deleted file mode 100644
index 0f38e0e8..00000000
--- a/.claude/skills/review-architecture/review-guidelines.md
+++ /dev/null
@@ -1,69 +0,0 @@
-# Architecture Review Guidelines
-
-## Primary sources of truth
-
-Review in this order:
-
-1. `docs/architecture.md`
-2. `docs/python-coding-guidelines.md`
-3. `docs/testing-strategy.md`
-4. `CLAUDE.md`
-5. `README.md` (CLI usage and user-facing workflows)
-
-Then validate actual implementation under:
-
-- `src/databao_cli/commands/`
-- `src/databao_cli/ui/`
-- `src/databao_cli/mcp/`
-- `src/databao_cli/project/`
-
-## Review goals
-
-- Confirm boundaries are clear and responsibilities are separated.
-- Confirm extension paths are low-friction (new command, MCP tool, UI page).
-- Confirm docs reflect real behavior and command paths.
-- Identify highest-impact improvements with minimal disruption.
-
-## Architecture checklist
-
-- Are modules aligned with single responsibility?
-- Are CLI concerns separated from business logic?
-- Is the Click command structure clean and discoverable?
-- Are MCP tools properly isolated in `mcp/tools/`?
-- Are UI components reusable and page-specific logic separated?
-- Are errors actionable and surfaced at the right layer?
-
-## Dev UX checklist
-
-- Can a new contributor find the right entrypoints quickly?
-- Are `uv` / `make` commands obvious and consistent across docs/code?
-- Is local verification clear (`make check`, `make test`)?
-- Are defaults safe when environment/dependencies are missing?
-- Do naming and file layout reduce cognitive load?
-- Are common workflows discoverable in README + docs?
-
-## Output format
-
-Provide a concise report with these sections:
-
-1. **Current State**: 3-6 bullets on what is working well.
-2. **Risks / Gaps**: prioritized issues (High/Med/Low) with evidence.
-3. **Recommendations**: concrete changes, ordered by impact vs effort.
-4. **Doc Sync Needed**: exact files that should be updated.
-5. **Validation Plan**: minimal commands to verify proposed changes.
-
-## Recommendation style
-
-- Prefer small, composable changes over sweeping rewrites.
-- Tie every recommendation to a specific pain point.
-- Include expected developer benefit (speed, clarity, reliability).
-- Flag trade-offs explicitly.
-- Separate immediate actions from longer-term improvements.
-
-## Guardrails
-
-- Do not invent architecture that conflicts with current code unless proposing it
-  explicitly as a future direction.
-- Do not request broad rewrites without clear ROI.
-- Keep proposals compatible with existing `uv` workflow and Click-based CLI.
-- Keep feedback actionable for the next pull request, not just aspirational.
diff --git a/CLAUDE.md b/CLAUDE.md
index 1562216a..3e85357a 100755
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -119,9 +119,9 @@ style issues; do not manually fix formatting.
 3. **Test & lint** — run `make check` then `make test-cov-check`. Fix any
    failures before proceeding.
 4. **Review** — run `local-code-review` and `review-architecture` skills.
-   Both spawn sub-agents with clean context (no prior conversation state)
-   and can run **in parallel**. Fix any high-severity findings before
-   proceeding.
+   Both run in forked sub-agent context (no prior conversation state)
+   and can run **in parallel**. Present findings to the user and **wait
+   for explicit approval** before fixing anything.
 5. **Branch** — use the `create-branch` skill.
 6. **Commit & PR** — use the `create-pr` skill (stages, commits, pauses
    for confirmation, pushes, and opens the PR).
diff --git a/scripts/smoke-test-skills.sh b/scripts/smoke-test-skills.sh
index ecc78d5b..0f2acd99 100755
--- a/scripts/smoke-test-skills.sh
+++ b/scripts/smoke-test-skills.sh
@@ -35,10 +35,6 @@ run_test "setup-environment: make setup" "make setup"
 # check-coverage: make test-cov-check succeeds
 run_test "check-coverage: make test-cov-check" "make test-cov-check"
 
-# review skills: review-guidelines.md files exist and are non-empty
-run_test "local-code-review: review-guidelines.md exists" "test -s .claude/skills/local-code-review/review-guidelines.md"
-run_test "review-architecture: review-guidelines.md exists" "test -s .claude/skills/review-architecture/review-guidelines.md"
-
 # review-architecture: referenced doc files exist and are non-empty
 for doc in docs/architecture.md docs/python-coding-guidelines.md docs/testing-strategy.md README.md; do
   run_test "review-architecture: $doc exists and non-empty" "test -s \"$doc\""

From c7f79f3c11ffa0a5fbfd66dd75f8fa8709b15bf0 Mon Sep 17 00:00:00 2001
From: Andrei Gasparian <andrei.gasparian@jetbrains.com>
Date: Mon, 23 Mar 2026 19:33:14 +0100
Subject: [PATCH 4/6] [DBA-290] Add pacing rule for user confirmation between
 phases

Introduce a general pacing guideline requiring the agent to pause for
user confirmation between significant workflow phases (plan, implement,
validate, commit, PR). Add explicit Plan step to the workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 CLAUDE.md | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/CLAUDE.md b/CLAUDE.md
index 3e85357a..e566464a 100755
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -5,8 +5,12 @@ Claude Code entrypoint for agent instructions in this repository.
 ## Agent Delta (Claude Code)
 
 - Prefer concise updates with clear file/command references.
-- Execute directly when safe; ask questions only if truly blocked.
 - YouTrack MCP must be configured (see DEVELOPMENT.md). Use get_issue / update_issue tools.
+- **Pacing rule**: pause for user confirmation between phases (plan →
+  implement → validate → commit → PR). Present what you intend to do or
+  what you just did, then wait for a go-ahead before moving to the next
+  phase. Small, safe actions within a phase (running tests, reading files)
+  do not require a pause.
 
 ## References
 
@@ -112,19 +116,21 @@ style issues; do not manually fix formatting.
 
 ## After Completing Work
 
-1. **Implement** — write the code changes to satisfy the ticket
-   requirements.
-2. **Write tests** — add or update unit tests covering the new behavior;
-   run `make test` to verify they pass.
-3. **Test & lint** — run `make check` then `make test-cov-check`. Fix any
+Each numbered step below is a **phase**. Present the outcome of each
+phase and wait for user confirmation before starting the next one.
+
+1. **Plan** — outline the approach and list files you intend to change.
+2. **Implement** — write the code changes to satisfy the ticket
+   requirements, including tests. Run `make test` to verify they pass.
+3. **Validate** — run `make check` then `make test-cov-check`. Fix any
    failures before proceeding.
 4. **Review** — run `local-code-review` and `review-architecture` skills.
    Both run in forked sub-agent context (no prior conversation state)
-   and can run **in parallel**. Present findings to the user and **wait
-   for explicit approval** before fixing anything.
-5. **Branch** — use the `create-branch` skill.
-6. **Commit & PR** — use the `create-pr` skill (stages, commits, pauses
-   for confirmation, pushes, and opens the PR).
+   and can run **in parallel**.
+5. **Branch & commit** — use the `create-branch` skill, then stage and
+   commit following **Commit Messages** conventions.
+6. **PR** — use the `create-pr` skill (pushes and opens the PR).
 7. **Update YouTrack** — move the ticket to **Review** state and add
    a comment with the PR URL.
-8. Never commit directly to `main`.
+
+Never commit directly to `main`.

From 42c8427c5498d979eae0c0a3ade7df4a7268b005 Mon Sep 17 00:00:00 2001
From: Andrei Gasparian <andrei.gasparian@jetbrains.com>
Date: Mon, 23 Mar 2026 19:51:35 +0100
Subject: [PATCH 5/6] [DBA-290] Add argument-hint and multi-skill support to
 eval-skills

Accept skill names as arguments (e.g. /eval-skills local-code-review).
When no arguments provided, ask the user which skills to evaluate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .claude/skills/eval-skills/SKILL.md | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/.claude/skills/eval-skills/SKILL.md b/.claude/skills/eval-skills/SKILL.md
index 3d7a0551..a04548cb 100644
--- a/.claude/skills/eval-skills/SKILL.md
+++ b/.claude/skills/eval-skills/SKILL.md
@@ -1,6 +1,7 @@
 ---
 name: eval-skills
 description: Run structured agent-in-the-loop evaluations on skills to measure quality and track improvements. Use after modifying a SKILL.md, after changing `CLAUDE.md` or guidance docs that skills depend on, or periodically to benchmark skill quality and catch regressions.
+argument-hint: "[skill-name ...] (e.g. local-code-review review-architecture)"
 ---
 
 # Eval Skills
@@ -10,10 +11,17 @@ improvements across iterations.
 
 ## Steps
 
-### 1. Pick the skill to evaluate
+### 1. Determine which skills to evaluate
 
-Identify which skill to eval. Each skill has test cases in
-`.claude/skills/<skill-name>/evals/evals.json`.
+This skill accepts an optional list of skill names via `$ARGUMENTS`.
+
+- If skill names are provided (e.g. `/eval-skills local-code-review review-architecture`),
+  evaluate only those skills.
+- If no arguments are provided, ask the user which skills they want to
+  evaluate. List all skills that have `evals/evals.json` files and let
+  the user pick. Accept "all" to evaluate every skill with evals.
+
+Each skill's test cases live in `.claude/skills/<skill-name>/evals/evals.json`.
 
 ### 2. Create an iteration directory
 

From 9b5bb6c8eb19f13556b4ea4a235878deb414f54f Mon Sep 17 00:00:00 2001
From: Andrei Gasparian <andrei.gasparian@jetbrains.com>
Date: Tue, 24 Mar 2026 18:03:38 +0100
Subject: [PATCH 6/6] [DBA-290] Add custom reviewer agent with read-only tool
 restriction

Replace agent: general-purpose with agent: reviewer in both review
skills. The reviewer agent restricts tools to Read, Glob, Grep, Bash,
structurally preventing file edits. Remove redundant "Do NOT edit"
instructions and replace verbose rewrite guardrail with concise
snippet guidance.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .claude/agents/reviewer.md                  | 8 ++++++++
 .claude/skills/local-code-review/SKILL.md   | 7 +++----
 .claude/skills/review-architecture/SKILL.md | 5 ++---
 3 files changed, 13 insertions(+), 7 deletions(-)
 create mode 100644 .claude/agents/reviewer.md

diff --git a/.claude/agents/reviewer.md b/.claude/agents/reviewer.md
new file mode 100644
index 00000000..72cfe616
--- /dev/null
+++ b/.claude/agents/reviewer.md
@@ -0,0 +1,8 @@
+---
+name: reviewer
+description: Read-only reviewer for code and architecture reviews. Cannot edit or write files.
+tools: Read, Glob, Grep, Bash
+---
+
+You are a read-only reviewer. You can inspect code, run non-mutating
+commands, and produce findings — but you must never modify files.
diff --git a/.claude/skills/local-code-review/SKILL.md b/.claude/skills/local-code-review/SKILL.md
index 3c207b48..696f6efd 100644
--- a/.claude/skills/local-code-review/SKILL.md
+++ b/.claude/skills/local-code-review/SKILL.md
@@ -3,15 +3,14 @@ name: local-code-review
 description: Review local code changes in Databao repositories before a commit or PR. Use when the user wants a review of staged or unstaged diffs, local branches, or pre-merge changes. Focus on correctness, regressions, missing tests, API/CLI behavior changes, executor or tooling changes, dependency or plugin-loading risks, and user-visible behavior changes.
 argument-hint: "[scope: staged | branch | files:<path>]"
 context: fork
-agent: general-purpose
+agent: reviewer
 ---
 
 # Local Code Review
 
 You are reviewing code changes for the Databao CLI project.
 You have NO prior context about why these changes were made — review
-purely on merit. Use your tools (Read, Grep, Glob, Bash) to inspect
-the code, diffs, and surrounding implementation. Do NOT edit any files.
+purely on merit.
 
 ## Scope
 
@@ -151,7 +150,7 @@ Format the results using Markdown.
 
 ## Guardrails
 
-- Do not rewrite the code during review mode.
+- Include short code snippets to illustrate suggested fixes, but keep them conceptual — avoid pasting full rewrites or verbose replacement blocks.
 - Do not bury findings under a long summary.
 - Do not claim tests passed unless you ran them.
 - Do not over-index on style when behavior risks exist.
diff --git a/.claude/skills/review-architecture/SKILL.md b/.claude/skills/review-architecture/SKILL.md
index dfdcc7a6..51fa4ad0 100755
--- a/.claude/skills/review-architecture/SKILL.md
+++ b/.claude/skills/review-architecture/SKILL.md
@@ -3,15 +3,14 @@ name: review-architecture
 description: Review architecture quality, maintainability, and developer experience before or after significant changes. Use when introducing a new CLI command or MCP tool, refactoring core module boundaries, diagnosing repeated dev friction, or preparing a PR with broad structural impact.
 argument-hint: "[scope: branch | module:<path> | full]"
 context: fork
-agent: general-purpose
+agent: reviewer
 ---
 
 # Review Architecture
 
 You are reviewing the architecture of the Databao CLI project.
 You have NO prior context about why these changes were made — review
-purely on merit. Use your tools (Read, Grep, Glob, Bash) to inspect
-the code and project structure. Do NOT edit any files.
+purely on merit.
 
 ## Scope