From f6bfda06d466c4e1069b2f3d9a8e59da6929a3ff Mon Sep 17 00:00:00 2001 From: Nabin Mulepati Date: Tue, 17 Mar 2026 14:23:56 -0600 Subject: [PATCH 1/6] docs: add agent-first development plan Plan for optimizing DataDesigner for agent-assisted development workflows, inspired by patterns from NVIDIA/OpenShell. Covers foundation document updates, GitHub machinery, skill infrastructure consolidation, and architecture documentation. Closes #427 --- plans/427/agent-first-development-plan.md | 381 ++++++++++++++++++++++ 1 file changed, 381 insertions(+) create mode 100644 plans/427/agent-first-development-plan.md diff --git a/plans/427/agent-first-development-plan.md b/plans/427/agent-first-development-plan.md new file mode 100644 index 000000000..99c933917 --- /dev/null +++ b/plans/427/agent-first-development-plan.md @@ -0,0 +1,381 @@ +--- +date: 2026-03-17 +authors: + - nmulepati +--- + +# Plan: Agent-First Development Ethos + +## Problem + +DataDesigner was built entirely by humans — and the codebase reflects that with strong architecture, comprehensive tests, and thoughtful design. But we are increasingly moving to an agent-assisted planning and development workflow. The project already has 7 agent skills, an agent introspection CLI, deep MCP integration, and a plugin system. None of this is visible from the three entry documents (README, CONTRIBUTING, AGENTS.md). The infrastructure exists but the front door doesn't signal it. + +## Current State + +| Asset | Status | Agent-First Signal | +|-------|--------|--------------------| +| README.md | Product-focused, usage-first | Zero mention of agent-first development | +| CONTRIBUTING.md | Standard OSS contributing guide | Zero mention of skills or agent workflows | +| AGENTS.md | ~500 lines, mixes code style with architecture | No skills inventory, no workflow chains | +| CLAUDE.md | 1 line: `@AGENTS.md` | Minimal | +| Issue templates | 4 templates (bug, feature, dev task, config) | No agent investigation fields | +| PR template | Doesn't exist | -- | +| CODEOWNERS | Catch-all `* @NVIDIA-NeMo/data_designer_reviewers` | No agent infra ownership | +| `.claude/skills/` | 7 skills (~25KB) | Claude Code-locked, invisible from top-level docs | +| `.agents/skills/` | Doesn't exist (yet) | -- | +| `architecture/` | Doesn't exist | -- | +| STYLEGUIDE.md | Doesn't exist (inlined in AGENTS.md) | -- | + +## Principles + +1. **Agent-assisted, not agent-only.** We are a human-built project adopting agent workflows. Agents accelerate planning, development, and review — humans make design decisions and own quality. +2. **Designed, not vibed.** Humans architect systems; agents help implement. The distinction matters and should be visible. This is not vibe coding. +3. **Firm gates, clear paths.** Issue templates encourage agent investigation. But the paths to success are well-lit for both agent-assisted and human-only contributors. +4. **Skills are the API.** The `.agents/skills/` directory is the contract between the project and contributor agents. Treat it as a first-class interface. +5. **Front door tells the story.** README, CONTRIBUTING, and AGENTS.md should signal that agent-assisted workflows are available and encouraged. A new contributor's immediate next step is "clone and point your agent at the repo." + +--- + +## Phase 1: Foundation Documents + +Three files define the front door. All three need to tell the same story. + +### 1a. README.md + +**Current state:** Product-focused. Zero signal of agent-first development. + +**Target state:** Retains the product pitch but frames DataDesigner as an agent-friendly project. A new developer's immediate next step is "clone and point your agent at the repo." + +| Section | Action | +|---------|--------| +| Hero / intro paragraph | Add a line signaling agent-first development alongside the product pitch | +| New: "Explore with Your Agent" | After quickstart. Clone the repo, point your agent at it, let it load the skills and answer your questions | +| New: "Built With Agents" | After the product sections. Surfaces `.agents/skills/` infrastructure, the workflow chains. This is the "how we work" section | +| Contributing | Expand from current brief mention to set expectations: agent-first contributions, link to CONTRIBUTING.md | + +**Key language to establish:** +- "DataDesigner supports agent-assisted development. We provide skills that help agents plan, build, and review code." +- "Before opening an issue, try pointing your agent at the repo. It has skills to help." + +### 1b. CONTRIBUTING.md + +**Current state:** Standard OSS contributing guide. Welcoming tone, good fork→develop→PR workflow, but zero agent awareness. + +**Target state:** Agent-assisted contribution workflow is the recommended path. Human-only paths are fully supported but agent workflows are encouraged and well-documented. + +| Section | Action | +|---------|--------| +| New: Opening philosophy | 2-3 sentences. This project supports agent-assisted development. Your agent is a powerful collaborator — we provide skills to help it help you. | +| New: "Before You Open an Issue" | **The gate.** Checklist: (1) Clone the repo, (2) Point your agent at it, (3) Load relevant skills, (4) Have your agent diagnose/investigate. If the agent can't solve it, open an issue with the diagnostics attached. | +| New: "Agent Skills for Contributors" | Table of all skills grouped by workflow category | +| New: "Workflow Chains" | Document the natural pipelines: investigation → development, and future spike → build | +| Getting Started | Keep as-is (fork, clone, install) | +| Development Guide | Keep as-is | +| Pull Requests | Update to reference the new PR template and the `create-pr` skill | +| New: "When to Open an Issue" | Clear guidance: real bugs your agent confirmed, feature proposals with design context, problems the `search-docs`/`search-github` skills couldn't resolve | +| New: "When NOT to Open an Issue" | Questions about how things work (agent can answer), configuration problems (agent can diagnose), "how do I..." requests (agent has skills for this) | +| Commit Messages / DCO | Keep as-is | + +**Skill groupings for the table:** + +| Category | Skills | Purpose | +|----------|--------|---------| +| Getting Started | `search-docs`, `search-github` | Find information, check for duplicates | +| Data Generation | `new-sdg` | Design synthetic data generators interactively | +| Development | `commit`, `create-pr`, `update-pr` | Standard development cycle | +| Review | `review-code` | Multi-pass code review | + +### 1c. AGENTS.md + +**Current state:** ~500 lines. Mixes project overview, architecture, code style, type annotations, linting rules, design principles, and testing patterns into one file. No skills inventory, no workflow chains, no project identity statement. + +**Target state:** The comprehensive entrypoint for any agent working on this codebase. Every agent reads this on load. It should give an agent everything it needs to be effective — without the code style reference (which moves to STYLEGUIDE.md). + +| Section | Action | +|---------|--------| +| Opening | Keep the CONTRIBUTING.md reference. Add: "This file is the primary instruction surface for agents contributing to DataDesigner." | +| New: "Project Identity" | 3-4 sentences: agent-assisted development, designed not vibed, the product generates synthetic datasets and increasingly uses agents for planning and development | +| New: "Skills" | Note that skills live in `.agents/skills/`. Agent harnesses can discover and load them natively. | +| New: "Workflow Chains" | Document the natural skill pipelines | +| Architecture | Keep existing 3-layer overview, key files, registries. Add brief component map. | +| New: "Issue and PR Conventions" | Reference the templates. When creating issues, use the template format. When creating PRs, use the PR template. Skills should produce output conforming to these templates. | +| Development Workflow | Keep as-is (`uv`, `make`, test commands) | +| Working Guidelines | Keep as-is (license headers, `__future__` imports, comments) | +| Testing | Keep as-is, add brief guidance on when to run which tests | +| New: "Security" | Don't commit secrets, don't run destructive operations without confirmation, scope changes to the issue at hand | +| Pre-commit | Keep as-is | +| Column/Model Configuration | Keep as-is (brief summaries) | +| Registry System | Keep as-is | +| **REMOVE** Code Style sections | Move to STYLEGUIDE.md (see 1d) | + +### 1d. STYLEGUIDE.md (new file) + +Extract from AGENTS.md. Contains all code style reference material: + +- General formatting (line length, quotes, indentation, target version) +- Type annotations (full section with all examples) +- Import style (absolute imports, lazy loading, TYPE_CHECKING — full section) +- Naming conventions (PEP 8 rules, verb-first function names) +- Code organization (public/private ordering, class method order, section comments) +- Design principles (DRY, KISS, YAGNI, SOLID — with examples) +- Common pitfalls (all 5 with code examples) +- Active linter rules (ruff rule reference) + +**CLAUDE.md updated to:** +``` +@AGENTS.md +@STYLEGUIDE.md +``` + +**Why:** AGENTS.md is loaded into every agent conversation. Code style is reference material — needed when writing code, not when triaging issues or creating spikes. Splitting reduces context cost and makes each file single-purpose. + +### 1e. Create `architecture/` Directory (Skeleton) + +Create stub files for each major subsystem. Each stub lists section headings but doesn't contain full content yet. Docs are populated incrementally as features are built. + +``` +architecture/ +├── overview.md # System architecture, package relationships, data flow diagram +├── config.md # Config layer: builder, column types, unions, plugin system +├── engine.md # Engine layer: compilation, generators, DAG execution, batching +├── models.md # Model facade, client adapters, retry/throttle, usage tracking +├── mcp.md # MCP: I/O service, session pooling, coalescing, tool execution +├── dataset-builders.md # Column-wise builder, async scheduler, DAG, concurrency +├── sampling.md # Person/entity sampling, locale system, data sources +├── cli.md # CLI architecture: commands → controllers → services → repos +├── agent-introspection.md # Agent CLI commands, type discovery, family specs +└── plugins.md # Plugin system: entry points, registry, discovery, validation +``` + +Each stub follows this template: +```markdown +# + +> Stub — to be populated. See source code at ``. + +## Overview + + +## Key Components + + +## Data Flow + + +## Design Decisions + + +## Cross-References + +``` + +**Why:** Agents producing plans can reference architecture docs to understand subsystems. Stubs establish the structure; content grows organically as features are built. + +--- + +## Phase 2: GitHub Machinery + +### 2a. Issue Templates + +Update existing `.github/ISSUE_TEMPLATE/` templates: + +**bug-report.yml** updates: +- Add **Agent Diagnostic** (textarea, required): "Paste the output from your agent's investigation of this bug. What skills did it use? What did it find? If you haven't had your agent investigate, please do that first — see CONTRIBUTING.md." +- Add **Checklist** (required): + - [ ] I pointed my agent at the repo and had it investigate this issue + - [ ] I loaded relevant skills (e.g., `search-docs`, `search-github`) + - [ ] My agent could not resolve this — the diagnostics above explain why +- Keep existing fields (priority, description, reproduction, expected behavior) + +**feature-request.yml** updates: +- Keep existing fields (problem, solution, alternatives) +- Add **Agent Investigation** (textarea, optional): "If your agent explored the codebase to assess feasibility (e.g., using the `search-docs` skill), paste its findings here." +- Add **Checklist**: + - [ ] I've reviewed existing issues and the documentation + - [ ] This is a design proposal, not a "please build this" request + +**config.yml** updates: +- Update contact link to: "Have a question? Point your agent at the repo. It has skills for searching docs, finding issues, and more. See CONTRIBUTING.md for the full list." + +### 2b. PR Template + +Create `.github/PULL_REQUEST_TEMPLATE.md`: + +```markdown +## Summary + + +## Related Issue + + +## Changes + + +## Testing + +- [ ] `make test` passes +- [ ] Unit tests added/updated +- [ ] E2E tests added/updated (if applicable) + +## Checklist +- [ ] Follows commit message conventions +- [ ] Commits are signed off (DCO) +- [ ] Architecture docs updated (if applicable) +``` + +Intentionally lean. The `create-pr` and `review-code` skills already produce well-structured descriptions; the template provides guardrails without fighting the skills. + +### 2c. CODEOWNERS + +Update `.github/CODEOWNERS` to add agent infrastructure ownership: + +``` +# Broad ownership — core team reviews everything +* @NVIDIA-NeMo/data_designer_reviewers + +# Agent infrastructure — tighter review +.agents/ @NVIDIA-NeMo/data_designer_reviewers +AGENTS.md @NVIDIA-NeMo/data_designer_reviewers +STYLEGUIDE.md @NVIDIA-NeMo/data_designer_reviewers +``` + +### 2d. Label Taxonomy + +Create labels for workflow state: + +| Label | Purpose | Used by | +|-------|---------|---------| +| `agent-ready` | Human-approved, agent can build | `build-from-issue` (future) | +| `review-ready` | Agent has posted a plan, needs human review | `build-from-issue`, `create-spike` (future) | +| `in-progress` | Agent is actively building | `build-from-issue` (future) | +| `pr-opened` | Implementation complete, PR submitted | `build-from-issue` (future) | +| `spike` | Needs deeper investigation | `create-spike` (future) | +| `needs-agent-triage` | Opened without agent diagnostics — redirect | Triage automation (future) | +| `good-first-issue` | Suitable for new contributors (with agents) | Manual | + +--- + +## Phase 3: Skill & Agent Infrastructure + +### 3a. Consolidate `.agents/` and `.claude/` + +**Goal:** `.agents/` is the canonical home for all agent infrastructure. `.claude/` becomes a thin shim for Claude Code-specific runtime state. + +**Current layout:** +``` +.claude/ + skills/ # 7 skills — Claude Code-specific location + agents/ # 2 sub-agent definitions (docs-searcher, github-searcher) + settings.json + settings.local.json +``` + +**Target layout:** +``` +.agents/ + skills/ # 7 skills (moved from .claude/skills/) + agents/ # Sub-agent persona definitions (moved from .claude/agents/) + docs-searcher.md + github-searcher.md + +.claude/ + skills # Symlink → ../.agents/skills + agents # Symlink → ../.agents/agents (or keep Claude-specific if frontmatter differs) + agent-memory/ # Stays here (Claude Code-specific, not portable) + settings.json + settings.local.json + +.codex/ + skills # Symlink → ../.agents/skills +``` + +**Changes:** +1. Move `.claude/skills/` to `.agents/skills/` (done in prototype) +2. Move `.claude/agents/*.md` to `.agents/agents/*.md` +3. Create symlinks from `.claude/` and `.codex/` +4. Add `.claude/README.md` explaining the structure + +### 3b. Update Skills to Conform to Templates + +Skills that create GitHub artifacts should produce output matching the new templates: + +**`create-pr`:** +- Produce PR descriptions matching the PR template structure (Summary / Related Issue / Changes / Testing / Checklist) +- Include the testing checklist populated based on what was actually run + +**`review-code`:** +- When reviewing PRs created from templates, check that the template sections are properly filled + +### 3c. Skill Cross-Reference Cleanup + +- Verify all skill files reference `.agents/skills/` not `.claude/skills/` +- Verify sub-agent references point to `.agents/agents/` +- Ensure all cross-skill references use consistent naming + +--- + +## Phase 4: Future Work (Separate PRs) + +### 4a. New Skills + +| Skill | Purpose | Depends on | +|-------|---------|------------| +| `build-from-issue` | Stateful plan → review → build → PR pipeline | Labels, `principal-engineer-reviewer` | +| `create-spike` | Investigate problem, create structured issue | `principal-engineer-reviewer` | +| `debug-sdg` | Debug failing data generation pipelines | Architecture docs | +| `generate-column-config` | Generate column configs from natural language | Agent introspection API | +| `watch-github-actions` | Monitor CI workflow runs | None | +| `sync-agent-infra` | Detect drift across agent files | Skills inventory | + +### 4b. Sub-Agent Personas + +| Agent | Purpose | Scope | +|-------|---------|-------| +| `principal-engineer-reviewer` | Analyze code, generate plans, review architecture | Read-only | +| `arch-doc-writer` | Update `architecture/` docs after features land | Write to `architecture/` only | + +### 4c. Issue Triage Workflow + +Create `.github/workflows/issue-triage.yml`: +- **Trigger:** `issues.opened` +- **Logic:** Check if the issue was created using the bug report template and the "Agent Diagnostic" field is empty or contains only placeholder text +- **Action:** Add the `needs-agent-triage` label and post a comment redirecting to CONTRIBUTING.md +- This is a simple deterministic check — not an LLM-powered triage bot + +--- + +## Execution Order + +| Step | Deliverable | Dependencies | Parallelizable | +|------|-------------|--------------|----------------| +| 1 | AGENTS.md restructure + STYLEGUIDE.md split | None | -- | +| 2 | CONTRIBUTING.md overhaul | AGENTS.md (references it) | -- | +| 3 | README.md updates | CONTRIBUTING.md (references it) | -- | +| 4 | Issue templates | CONTRIBUTING.md (templates link to it) | Yes (with 5-8) | +| 5 | PR template | None | Yes (with 4, 6-8) | +| 6 | CODEOWNERS update | None | Yes (with 4-5, 7-8) | +| 7 | Label creation (via `gh label create`) | None | Yes (with 4-6, 8) | +| 8 | Skill consolidation (`.agents/`, `.claude/` cleanup) | None | Yes (with 4-7) | +| 9 | `architecture/` skeleton | None | Yes (with 4-8) | +| 10 | Skill template conformance updates | Issue/PR templates (steps 4-5) | -- | + +Steps 1-3 are sequential. Steps 4-9 are independent and can be parallelized. Step 10 depends on earlier steps. + +--- + +## Out of Scope + +- **New skills** — the 7 existing skills are sufficient; this plan surfaces them +- **LLM-powered issue triage** — deliberate choice to keep triage deterministic +- **Vouch system** — defer until external contributor volume warrants it +- **CI/CD changes** — existing workflows are solid +- **Full architecture docs** — create skeleton only, populate incrementally +- **Dependabot / Renovate** — dependency management automation (separate concern) + +## Open Questions + +1. **STYLEGUIDE.md naming** — `STYLEGUIDE.md` vs `CODE_STYLE.md` vs `STYLE.md`? +2. **Issue template strictness** — Should agent diagnostic be required (gate) or optional (encouraged) on bug reports? OpenShell requires it. DataDesigner could start with required and relax if it creates too much friction. +3. **README tone** — How prominent should the agent-first messaging be? A line in the hero paragraph, or a dedicated section? +4. **CODEOWNERS granularity** — Keep catch-all or add file-specific ownership for docs/CI? +5. **Phase 2 timing** — Land foundation docs (Phase 1) first and iterate, or ship Phases 1-2 together? From e8b6bff2046c06b86391d5d4d879dadcd97b2ef3 Mon Sep 17 00:00:00 2001 From: Nabin Mulepati Date: Tue, 17 Mar 2026 15:04:11 -0600 Subject: [PATCH 2/6] Update plan --- plans/427/agent-first-development-plan.md | 457 +++++++++++++--------- 1 file changed, 266 insertions(+), 191 deletions(-) diff --git a/plans/427/agent-first-development-plan.md b/plans/427/agent-first-development-plan.md index 99c933917..d0d0eaa25 100644 --- a/plans/427/agent-first-development-plan.md +++ b/plans/427/agent-first-development-plan.md @@ -1,114 +1,184 @@ --- -date: 2026-03-17 + +## date: 2026-03-17 authors: - nmulepati ---- -# Plan: Agent-First Development Ethos +# Plan: Agent-Assisted Development Principles ## Problem -DataDesigner was built entirely by humans — and the codebase reflects that with strong architecture, comprehensive tests, and thoughtful design. But we are increasingly moving to an agent-assisted planning and development workflow. The project already has 7 agent skills, an agent introspection CLI, deep MCP integration, and a plugin system. None of this is visible from the three entry documents (README, CONTRIBUTING, AGENTS.md). The infrastructure exists but the front door doesn't signal it. +DataDesigner was built entirely by humans, and the codebase reflects that with strong architecture, comprehensive tests, and thoughtful design. We are now increasingly moving toward an agent-assisted planning and development workflow. The project already has meaningful agent-oriented infrastructure: seven skills, an agent introspection CLI, and supporting tooling. But a new contributor reading `README.md`, `CONTRIBUTING.md`, or `AGENTS.md` would not immediately discover that these workflows exist. The repository supports agent-assisted work, yet the top-level documentation still presents the project mostly as a conventional human-only codebase. + +## Inspiration + +This proposal draws strong inspiration from [NVIDIA/OpenShell](https://github.com/NVIDIA/OpenShell), which makes agent workflows and contributor guidance highly visible from the repository root. The goal is to bring those ideas into DataDesigner in a way that makes the project more agent-friendly while fitting its role as a public open-source Python library for synthetic data generation. Because DataDesigner serves a different audience, the resulting workflow should remain lighter-weight and more flexible than OpenShell's. ## Current State -| Asset | Status | Agent-First Signal | -|-------|--------|--------------------| -| README.md | Product-focused, usage-first | Zero mention of agent-first development | -| CONTRIBUTING.md | Standard OSS contributing guide | Zero mention of skills or agent workflows | -| AGENTS.md | ~500 lines, mixes code style with architecture | No skills inventory, no workflow chains | -| CLAUDE.md | 1 line: `@AGENTS.md` | Minimal | -| Issue templates | 4 templates (bug, feature, dev task, config) | No agent investigation fields | -| PR template | Doesn't exist | -- | -| CODEOWNERS | Catch-all `* @NVIDIA-NeMo/data_designer_reviewers` | No agent infra ownership | -| `.claude/skills/` | 7 skills (~25KB) | Claude Code-locked, invisible from top-level docs | -| `.agents/skills/` | Doesn't exist (yet) | -- | -| `architecture/` | Doesn't exist | -- | -| STYLEGUIDE.md | Doesn't exist (inlined in AGENTS.md) | -- | + +| Asset | Status | Agent-First Signal | +| ----------------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | +| README.md | Product-focused, usage-first | Zero mention of agent-first development | +| CONTRIBUTING.md | Standard OSS contributing guide | Zero mention of skills or agent workflows | +| AGENTS.md | ~500 lines, mixes architecture, code style, and engineering workflow guidance | No skills inventory, no workflow chains | +| CLAUDE.md | 1 line: `@AGENTS.md` | Minimal | +| Issue templates | 4 templates (bug, feature, dev task, config) | No agent investigation fields | +| PR template | Doesn't exist | -- | +| CODEOWNERS | Catch-all `* @NVIDIA-NeMo/data_designer_reviewers` | No agent infra ownership | +| `.claude/skills/` | Current skill location | Invisible from top-level docs | +| `.agents/skills/` | Doesn't exist yet | Planned future location, but not present or documented today | +| `architecture/` | Doesn't exist | -- | +| STYLEGUIDE.md | Doesn't exist (inlined in AGENTS.md) | -- | +| DEVELOPMENT.md | Doesn't exist | Setup, testing, and day-to-day workflow guidance are scattered across `AGENTS.md` and `CONTRIBUTING.md` | + ## Principles -1. **Agent-assisted, not agent-only.** We are a human-built project adopting agent workflows. Agents accelerate planning, development, and review — humans make design decisions and own quality. -2. **Designed, not vibed.** Humans architect systems; agents help implement. The distinction matters and should be visible. This is not vibe coding. -3. **Firm gates, clear paths.** Issue templates encourage agent investigation. But the paths to success are well-lit for both agent-assisted and human-only contributors. -4. **Skills are the API.** The `.agents/skills/` directory is the contract between the project and contributor agents. Treat it as a first-class interface. -5. **Front door tells the story.** README, CONTRIBUTING, and AGENTS.md should signal that agent-assisted workflows are available and encouraged. A new contributor's immediate next step is "clone and point your agent at the repo." +1. **Agents accelerate work; humans stay accountable.** Agents can speed up planning, implementation, and review, but people still make design decisions and own quality. +2. **Design intent should remain explicit.** The project should communicate that systems are deliberately engineered, with agents supporting the work rather than replacing architectural judgment. +3. **Encourage agent investigation without blocking real users.** Issue templates should normalize agent-assisted investigation, but contributors who cannot or did not use an agent still need a clear path to report bugs and propose features. +4. **Skills are part of the contributor surface area.** The future `.agents/skills/` directory should be treated as a maintained interface between the project and contributor agents. +5. **Top-level docs should advertise the workflow.** `README.md`, `CONTRIBUTING.md`, and `AGENTS.md` should make agent-assisted paths obvious to new contributors. + +--- + +## Phase 1: Skill & Agent Infrastructure + +This phase lands first so the repository has a stable, tool-agnostic home for shared agent assets before the documentation starts pointing contributors at it. + +### 1a. Consolidate `.agents/` and `.claude/` + +**Goal:** `.agents/` becomes the primary tool-agnostic location for shared agent infrastructure. `.claude/` remains a compatibility layer for Claude Code-specific runtime state. + +**Current layout before consolidation:** + +``` +.claude/ + skills/ # 7 skills — Claude Code-specific location + agents/ # 2 sub-agent definitions (docs-searcher, github-searcher) + settings.json + settings.local.json +``` + +**Target layout:** + +``` +.agents/ + skills/ # 7 skills (moved from .claude/skills/) + agents/ # Sub-agent persona definitions (moved from .claude/agents/) + docs-searcher.md + github-searcher.md + +.claude/ + skills # Symlink → ../.agents/skills + agents # Symlink → ../.agents/agents (or keep Claude-specific if frontmatter differs) + agent-memory/ # Stays here (Claude Code-specific, not portable) + settings.json + settings.local.json + +.codex/ + skills # Symlink → ../.agents/skills +``` + +**Changes:** + +1. Create `.agents/skills/` and move `.claude/skills/` into it +2. Move `.claude/agents/*.md` to `.agents/agents/*.md` +3. Create symlinks from `.claude/` and `.codex/` if both harnesses resolve them correctly; otherwise keep mirrored directories and add a drift-check task +4. Add `.claude/README.md` explaining the structure + +### 1b. Skill Cross-Reference Cleanup + +- Verify all skill files reference `.agents/skills/` not `.claude/skills/` +- Verify sub-agent references point to `.agents/agents/` +- Ensure all cross-skill references use consistent naming --- -## Phase 1: Foundation Documents +## Phase 2: Foundation Documents -Three files define the front door. All three need to tell the same story. +This phase updates the contributor-facing docs after the agent-infrastructure paths and terminology are settled. -### 1a. README.md +### 2a. README.md **Current state:** Product-focused. Zero signal of agent-first development. -**Target state:** Retains the product pitch but frames DataDesigner as an agent-friendly project. A new developer's immediate next step is "clone and point your agent at the repo." +**Target state:** Retains the product pitch while making it obvious that DataDesigner supports agent-assisted development. A new developer should quickly understand that the repo contains workflows and guidance their agent can use. + + +| Section | Action | +| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Hero / intro paragraph | Add a line signaling agent-first development alongside the product pitch | +| New: "Get Oriented with an Agent" | After quickstart. Show contributors how to clone the repo, point an agent at it, and use the repo guidance to answer questions quickly | +| New: "How Agent Workflows Fit In" | After the product sections. Explain how agent-assisted workflows support development here, and link to `AGENTS.md` for the authoritative skills inventory and workflow guidance | +| Contributing | Expand from current brief mention to set expectations: agent-first contributions, link to CONTRIBUTING.md | -| Section | Action | -|---------|--------| -| Hero / intro paragraph | Add a line signaling agent-first development alongside the product pitch | -| New: "Explore with Your Agent" | After quickstart. Clone the repo, point your agent at it, let it load the skills and answer your questions | -| New: "Built With Agents" | After the product sections. Surfaces `.agents/skills/` infrastructure, the workflow chains. This is the "how we work" section | -| Contributing | Expand from current brief mention to set expectations: agent-first contributions, link to CONTRIBUTING.md | **Key language to establish:** -- "DataDesigner supports agent-assisted development. We provide skills that help agents plan, build, and review code." -- "Before opening an issue, try pointing your agent at the repo. It has skills to help." -### 1b. CONTRIBUTING.md +- "DataDesigner supports agent-assisted planning, implementation, and review." +- "Before opening an issue, consider asking your coding agent to inspect the repository first." + +### 2b. CONTRIBUTING.md **Current state:** Standard OSS contributing guide. Welcoming tone, good fork→develop→PR workflow, but zero agent awareness. **Target state:** Agent-assisted contribution workflow is the recommended path. Human-only paths are fully supported but agent workflows are encouraged and well-documented. -| Section | Action | -|---------|--------| -| New: Opening philosophy | 2-3 sentences. This project supports agent-assisted development. Your agent is a powerful collaborator — we provide skills to help it help you. | -| New: "Before You Open an Issue" | **The gate.** Checklist: (1) Clone the repo, (2) Point your agent at it, (3) Load relevant skills, (4) Have your agent diagnose/investigate. If the agent can't solve it, open an issue with the diagnostics attached. | -| New: "Agent Skills for Contributors" | Table of all skills grouped by workflow category | -| New: "Workflow Chains" | Document the natural pipelines: investigation → development, and future spike → build | -| Getting Started | Keep as-is (fork, clone, install) | -| Development Guide | Keep as-is | -| Pull Requests | Update to reference the new PR template and the `create-pr` skill | -| New: "When to Open an Issue" | Clear guidance: real bugs your agent confirmed, feature proposals with design context, problems the `search-docs`/`search-github` skills couldn't resolve | -| New: "When NOT to Open an Issue" | Questions about how things work (agent can answer), configuration problems (agent can diagnose), "how do I..." requests (agent has skills for this) | -| Commit Messages / DCO | Keep as-is | - -**Skill groupings for the table:** - -| Category | Skills | Purpose | -|----------|--------|---------| -| Getting Started | `search-docs`, `search-github` | Find information, check for duplicates | -| Data Generation | `new-sdg` | Design synthetic data generators interactively | -| Development | `commit`, `create-pr`, `update-pr` | Standard development cycle | -| Review | `review-code` | Multi-pass code review | - -### 1c. AGENTS.md - -**Current state:** ~500 lines. Mixes project overview, architecture, code style, type annotations, linting rules, design principles, and testing patterns into one file. No skills inventory, no workflow chains, no project identity statement. - -**Target state:** The comprehensive entrypoint for any agent working on this codebase. Every agent reads this on load. It should give an agent everything it needs to be effective — without the code style reference (which moves to STYLEGUIDE.md). - -| Section | Action | -|---------|--------| -| Opening | Keep the CONTRIBUTING.md reference. Add: "This file is the primary instruction surface for agents contributing to DataDesigner." | -| New: "Project Identity" | 3-4 sentences: agent-assisted development, designed not vibed, the product generates synthetic datasets and increasingly uses agents for planning and development | -| New: "Skills" | Note that skills live in `.agents/skills/`. Agent harnesses can discover and load them natively. | -| New: "Workflow Chains" | Document the natural skill pipelines | -| Architecture | Keep existing 3-layer overview, key files, registries. Add brief component map. | -| New: "Issue and PR Conventions" | Reference the templates. When creating issues, use the template format. When creating PRs, use the PR template. Skills should produce output conforming to these templates. | -| Development Workflow | Keep as-is (`uv`, `make`, test commands) | -| Working Guidelines | Keep as-is (license headers, `__future__` imports, comments) | -| Testing | Keep as-is, add brief guidance on when to run which tests | -| New: "Security" | Don't commit secrets, don't run destructive operations without confirmation, scope changes to the issue at hand | -| Pre-commit | Keep as-is | -| Column/Model Configuration | Keep as-is (brief summaries) | -| Registry System | Keep as-is | -| **REMOVE** Code Style sections | Move to STYLEGUIDE.md (see 1d) | - -### 1d. STYLEGUIDE.md (new file) + +| Section | Action | +| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| New: Opening philosophy | 2-3 sentences explaining that this project supports agent-assisted development and includes repo guidance and skills that make agents more effective contributors. | +| New: "Before You Open an Issue" | Recommended path. Checklist: (1) Clone the repo, (2) Point your agent at it, (3) Load relevant skills, (4) Have your agent diagnose/investigate. If the agent can't solve it, include the diagnostics. If you couldn't use an agent, say why and include the troubleshooting you already tried. | +| New: "Contributor Skill Map" | Short category summary plus a link to `AGENTS.md`, which remains the authoritative skill inventory | +| New: "Common Agent Workflows" | Document typical paths such as investigation → development, and future spike → build | +| Getting Started | Keep as-is (fork, clone, install) | +| Development Guide | Keep the contributor-facing summary, but link to `DEVELOPMENT.md` for detailed setup, testing, and day-to-day engineering workflow | +| Pull Requests | Update to reference the new PR template and the `create-pr` skill | +| New: "When to Open an Issue" | Clear guidance: real bugs your agent confirmed or you reproduced yourself with enough detail, feature proposals with design context, problems the `search-docs`/`search-github` skills couldn't resolve | +| New: "When NOT to Open an Issue" | Questions about how things work (agent can answer), configuration problems (agent can diagnose), "how do I..." requests (agent has skills for this) | +| Commit Messages / DCO | Keep as-is | + + +**Skill categories to summarize (keep `AGENTS.md` as the authoritative inventory):** + + +| Category | Skills | Purpose | +| --------------- | ---------------------------------- | ---------------------------------------------- | +| Getting Started | `search-docs`, `search-github` | Find information, check for duplicates | +| Data Generation | `new-sdg` | Design synthetic data generators interactively | +| Development | `commit`, `create-pr`, `update-pr` | Standard development cycle | +| Review | `review-code` | Multi-pass code review | + + +### 2c. AGENTS.md + +**Current state:** ~500 lines. Mixes project overview, architecture, code style, development workflow, and testing guidance into one file. No skills inventory, no workflow chains, no project identity statement. + +**Target state:** The main onboarding document for agents working on this codebase. It should provide enough architectural and workflow context to make an agent effective, while moving code-authoring reference material to `STYLEGUIDE.md` and detailed development/testing workflow guidance to `DEVELOPMENT.md`. + + +| Section | Action | +| ----------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Opening | Keep the CONTRIBUTING.md reference. Add: "This file is the main onboarding surface for agents contributing to DataDesigner." | +| New: "Project Identity" | 3-4 sentences: DataDesigner is a synthetic-data project built by humans that now supports agent-assisted planning and development, while keeping human ownership over design decisions | +| New: "Skills" | Authoritative skill inventory. Note where skills live after consolidation, how harnesses discover them, and keep the full inventory here rather than duplicating it across README and CONTRIBUTING | +| New: "Suggested Workflows" | Document the common skill sequences and when to use them | +| Architecture | Keep existing 3-layer overview, key files, registries. Add brief component map. | +| New: "Issue and PR Conventions" | Reference the templates. When creating issues, use the template format. When creating PRs, use the PR template. Skills should produce output conforming to these templates. | +| Development Workflow | Move detailed setup and day-to-day engineering commands to `DEVELOPMENT.md`; keep only a short summary and link | +| Working Guidelines | Split the section: code-authoring guidance (license headers, `__future__` imports, comments) moves to `STYLEGUIDE.md`, while operational safety guidance stays in `AGENTS.md` | +| Testing | Move detailed test commands and "when to run what" guidance to `DEVELOPMENT.md`; keep only short expectations and link | +| New: "Security" | Don't commit secrets, don't run destructive operations without confirmation, scope changes to the issue at hand | +| Pre-commit | Move command details to `DEVELOPMENT.md`; keep only a brief mention in `AGENTS.md` if needed | +| Column/Model Configuration | Keep as-is (brief summaries) | +| Registry System | Keep as-is | +| **REMOVE** Code Style sections | Move to STYLEGUIDE.md (see 2d) | +| **MOVE** detailed workflow/testing reference sections | Move to `DEVELOPMENT.md` (see 2e) | + + +### 2d. STYLEGUIDE.md (new file) Extract from AGENTS.md. Contains all code style reference material: @@ -119,17 +189,34 @@ Extract from AGENTS.md. Contains all code style reference material: - Code organization (public/private ordering, class method order, section comments) - Design principles (DRY, KISS, YAGNI, SOLID — with examples) - Common pitfalls (all 5 with code examples) +- Code-authoring guidance from Working Guidelines: license headers, `from __future__ import annotations`, and comment expectations - Active linter rules (ruff rule reference) -**CLAUDE.md updated to:** +**CLAUDE.md remains:** + ``` @AGENTS.md -@STYLEGUIDE.md ``` -**Why:** AGENTS.md is loaded into every agent conversation. Code style is reference material — needed when writing code, not when triaging issues or creating spikes. Splitting reduces context cost and makes each file single-purpose. +`AGENTS.md` should link to `STYLEGUIDE.md` and `DEVELOPMENT.md` when deeper reference material is needed. -### 1e. Create `architecture/` Directory (Skeleton) +**Why:** AGENTS.md is loaded into every agent conversation. Code style is reference material — needed when writing code, not when triaging issues or creating spikes. Splitting reduces context cost only if `STYLEGUIDE.md` is not loaded unconditionally. + +### 2e. DEVELOPMENT.md (new file) + +Collect from `AGENTS.md` and `CONTRIBUTING.md`. Contains development and testing reference material: + +- Local setup and install commands (`uv`, `make install-dev`, notebook/dev variants) +- Day-to-day engineering workflow (branching, syncing with upstream, validation commands) +- Testing commands and guidance on when to run which test suites +- Pre-commit usage and expected local checks before opening a PR +- Practical contributor workflow details that are too operational for `AGENTS.md` and too detailed for `CONTRIBUTING.md` + +`AGENTS.md` and `CONTRIBUTING.md` should link to `DEVELOPMENT.md` rather than duplicating these details. + +**Why:** Development workflow and testing guidance are operational reference material, not project identity or code style. Moving them out keeps `AGENTS.md` focused on onboarding and architecture, keeps `STYLEGUIDE.md` focused on how code should look, and keeps `CONTRIBUTING.md` concise. + +### 2f. Create `architecture/` Directory (Skeleton) Create stub files for each major subsystem. Each stub lists section headings but doesn't contain full content yet. Docs are populated incrementally as features are built. @@ -148,6 +235,7 @@ architecture/ ``` Each stub follows this template: + ```markdown # @@ -173,31 +261,40 @@ Each stub follows this template: --- -## Phase 2: GitHub Machinery +## Phase 3: GitHub Machinery -### 2a. Issue Templates +### 3a. Issue Templates Update existing `.github/ISSUE_TEMPLATE/` templates: **bug-report.yml** updates: -- Add **Agent Diagnostic** (textarea, required): "Paste the output from your agent's investigation of this bug. What skills did it use? What did it find? If you haven't had your agent investigate, please do that first — see CONTRIBUTING.md." + +- Add **Agent Diagnostic / Prior Investigation** (textarea, recommended): "If you used an agent, paste the output from its investigation. If you couldn't or didn't, briefly say why and include the troubleshooting you already tried." - Add **Checklist** (required): - - [ ] I pointed my agent at the repo and had it investigate this issue - - [ ] I loaded relevant skills (e.g., `search-docs`, `search-github`) - - [ ] My agent could not resolve this — the diagnostics above explain why + - I reproduced this issue or provided a minimal example + - I searched the docs/issues myself, or had my agent do so + - If I used an agent, I included its diagnostics above - Keep existing fields (priority, description, reproduction, expected behavior) **feature-request.yml** updates: + - Keep existing fields (problem, solution, alternatives) - Add **Agent Investigation** (textarea, optional): "If your agent explored the codebase to assess feasibility (e.g., using the `search-docs` skill), paste its findings here." - Add **Checklist**: - - [ ] I've reviewed existing issues and the documentation - - [ ] This is a design proposal, not a "please build this" request + - I've reviewed existing issues and the documentation + - This is a design proposal, not a "please build this" request + +**development-task.yml** updates: + +- Clarify that it is for tracked internal work, refactors, and infra changes rather than end-user support requests +- Add **Investigation / Context** (textarea, optional): relevant issue links, notes, or architecture context +- Add **Agent Plan / Findings** (textarea, optional): what the agent found, proposed, or could not resolve **config.yml** updates: -- Update contact link to: "Have a question? Point your agent at the repo. It has skills for searching docs, finding issues, and more. See CONTRIBUTING.md for the full list." -### 2b. PR Template +- Keep the Discussions link, but update the copy to: "Have a question? Try pointing your agent at the repo first. It has skills for searching docs, finding issues, and more. See CONTRIBUTING.md for the workflow and AGENTS.md for the full skill inventory." + +### 3b. PR Template Create `.github/PULL_REQUEST_TEMPLATE.md`: @@ -225,141 +322,119 @@ Create `.github/PULL_REQUEST_TEMPLATE.md`: Intentionally lean. The `create-pr` and `review-code` skills already produce well-structured descriptions; the template provides guardrails without fighting the skills. -### 2c. CODEOWNERS +### 3c. CODEOWNERS -Update `.github/CODEOWNERS` to add agent infrastructure ownership: +Update `.github/CODEOWNERS` to explicitly call out agent-infrastructure ownership paths: ``` # Broad ownership — core team reviews everything * @NVIDIA-NeMo/data_designer_reviewers -# Agent infrastructure — tighter review +# Agent infrastructure — explicit path callouts for visibility .agents/ @NVIDIA-NeMo/data_designer_reviewers AGENTS.md @NVIDIA-NeMo/data_designer_reviewers STYLEGUIDE.md @NVIDIA-NeMo/data_designer_reviewers ``` -### 2d. Label Taxonomy +### 3d. Label Taxonomy Create labels for workflow state: -| Label | Purpose | Used by | -|-------|---------|---------| -| `agent-ready` | Human-approved, agent can build | `build-from-issue` (future) | -| `review-ready` | Agent has posted a plan, needs human review | `build-from-issue`, `create-spike` (future) | -| `in-progress` | Agent is actively building | `build-from-issue` (future) | -| `pr-opened` | Implementation complete, PR submitted | `build-from-issue` (future) | -| `spike` | Needs deeper investigation | `create-spike` (future) | -| `needs-agent-triage` | Opened without agent diagnostics — redirect | Triage automation (future) | -| `good-first-issue` | Suitable for new contributors (with agents) | Manual | - ---- - -## Phase 3: Skill & Agent Infrastructure - -### 3a. Consolidate `.agents/` and `.claude/` - -**Goal:** `.agents/` is the canonical home for all agent infrastructure. `.claude/` becomes a thin shim for Claude Code-specific runtime state. - -**Current layout:** -``` -.claude/ - skills/ # 7 skills — Claude Code-specific location - agents/ # 2 sub-agent definitions (docs-searcher, github-searcher) - settings.json - settings.local.json -``` - -**Target layout:** -``` -.agents/ - skills/ # 7 skills (moved from .claude/skills/) - agents/ # Sub-agent persona definitions (moved from .claude/agents/) - docs-searcher.md - github-searcher.md - -.claude/ - skills # Symlink → ../.agents/skills - agents # Symlink → ../.agents/agents (or keep Claude-specific if frontmatter differs) - agent-memory/ # Stays here (Claude Code-specific, not portable) - settings.json - settings.local.json -.codex/ - skills # Symlink → ../.agents/skills -``` +| Label | Purpose | Used by | +| -------------------- | --------------------------------------------------------------------------------- | ------------------------------------------- | +| `agent-ready` | Human-approved, agent can build | `build-from-issue` (future) | +| `review-ready` | Agent has posted a plan, needs human review | `build-from-issue`, `create-spike` (future) | +| `in-progress` | Agent is actively building | `build-from-issue` (future) | +| `pr-opened` | Implementation complete, PR submitted | `build-from-issue` (future) | +| `spike` | Needs deeper investigation | `create-spike` (future) | +| `needs-more-context` | Issue is missing useful reproduction or investigation context and needs follow-up | Triage automation (future) | +| `good-first-issue` | Suitable for new contributors (with agents) | Manual | -**Changes:** -1. Move `.claude/skills/` to `.agents/skills/` (done in prototype) -2. Move `.claude/agents/*.md` to `.agents/agents/*.md` -3. Create symlinks from `.claude/` and `.codex/` -4. Add `.claude/README.md` explaining the structure -### 3b. Update Skills to Conform to Templates +### 3e. Update Skills to Conform to Templates Skills that create GitHub artifacts should produce output matching the new templates: -**`create-pr`:** +`**create-pr`:** + - Produce PR descriptions matching the PR template structure (Summary / Related Issue / Changes / Testing / Checklist) - Include the testing checklist populated based on what was actually run -**`review-code`:** -- When reviewing PRs created from templates, check that the template sections are properly filled - -### 3c. Skill Cross-Reference Cleanup +`**review-code`:** -- Verify all skill files reference `.agents/skills/` not `.claude/skills/` -- Verify sub-agent references point to `.agents/agents/` -- Ensure all cross-skill references use consistent naming +- When reviewing PRs created from templates, check that the template sections are properly filled --- -## Phase 4: Future Work (Separate PRs) +## Phase 4: Future Work (Requires Separate Planning) ### 4a. New Skills -| Skill | Purpose | Depends on | -|-------|---------|------------| -| `build-from-issue` | Stateful plan → review → build → PR pipeline | Labels, `principal-engineer-reviewer` | -| `create-spike` | Investigate problem, create structured issue | `principal-engineer-reviewer` | -| `debug-sdg` | Debug failing data generation pipelines | Architecture docs | -| `generate-column-config` | Generate column configs from natural language | Agent introspection API | -| `watch-github-actions` | Monitor CI workflow runs | None | -| `sync-agent-infra` | Detect drift across agent files | Skills inventory | + +| Skill | Purpose | Depends on | +| ------------------------ | --------------------------------------------- | ------------------------------------- | +| `build-from-issue` | Stateful plan → review → build → PR pipeline | Labels, `principal-engineer-reviewer` | +| `create-spike` | Investigate problem, create structured issue | `principal-engineer-reviewer` | +| `debug-sdg` | Debug failing data generation pipelines | Architecture docs | +| `generate-column-config` | Generate column configs from natural language | Agent introspection API | +| `watch-github-actions` | Monitor CI workflow runs | None | +| `sync-agent-infra` | Detect drift across agent files | Skills inventory | + ### 4b. Sub-Agent Personas -| Agent | Purpose | Scope | -|-------|---------|-------| -| `principal-engineer-reviewer` | Analyze code, generate plans, review architecture | Read-only | -| `arch-doc-writer` | Update `architecture/` docs after features land | Write to `architecture/` only | + +| Agent | Purpose | Scope | +| ----------------------------- | ------------------------------------------------- | ----------------------------- | +| `principal-engineer-reviewer` | Analyze code, generate plans, review architecture | Read-only | +| `arch-doc-writer` | Update `architecture/` docs after features land | Write to `architecture/` only | + ### 4c. Issue Triage Workflow Create `.github/workflows/issue-triage.yml`: + - **Trigger:** `issues.opened` -- **Logic:** Check if the issue was created using the bug report template and the "Agent Diagnostic" field is empty or contains only placeholder text -- **Action:** Add the `needs-agent-triage` label and post a comment redirecting to CONTRIBUTING.md +- **Logic:** Check if the issue was created using the bug report template and the investigation/context fields are empty or contain only placeholder text +- **Action:** Add the `needs-more-context` label and post a comment asking for more reproduction or investigation detail; recommend agent output as a fast path, but do not require it - This is a simple deterministic check — not an LLM-powered triage bot --- +## Delivery Strategy + +Land this work as a sequence of incremental PRs rather than a single large rollout: + +1. **Phase 1 PR(s)** — agent infrastructure consolidation and path cleanup. +2. **Phase 2 PR(s)** — documentation restructuring (`AGENTS.md`, `STYLEGUIDE.md`, `DEVELOPMENT.md`, `CONTRIBUTING.md`, `README.md`, and optional `architecture/` skeleton). +3. **Phase 3 PR(s)** — GitHub machinery such as templates, labels, and skill output conformance. +4. **Phase 4** — do not start implementation directly from this plan. Treat it as follow-on work that requires another planning pass, design review, and then its own incremental PRs. + +The default PR boundary should be the phase boundary. If a phase is still too large, split it into a small sequence of focused PRs, but keep the phases ordered and avoid mixing deliverables from different phases in one PR. + +--- + ## Execution Order -| Step | Deliverable | Dependencies | Parallelizable | -|------|-------------|--------------|----------------| -| 1 | AGENTS.md restructure + STYLEGUIDE.md split | None | -- | -| 2 | CONTRIBUTING.md overhaul | AGENTS.md (references it) | -- | -| 3 | README.md updates | CONTRIBUTING.md (references it) | -- | -| 4 | Issue templates | CONTRIBUTING.md (templates link to it) | Yes (with 5-8) | -| 5 | PR template | None | Yes (with 4, 6-8) | -| 6 | CODEOWNERS update | None | Yes (with 4-5, 7-8) | -| 7 | Label creation (via `gh label create`) | None | Yes (with 4-6, 8) | -| 8 | Skill consolidation (`.agents/`, `.claude/` cleanup) | None | Yes (with 4-7) | -| 9 | `architecture/` skeleton | None | Yes (with 4-8) | -| 10 | Skill template conformance updates | Issue/PR templates (steps 4-5) | -- | - -Steps 1-3 are sequential. Steps 4-9 are independent and can be parallelized. Step 10 depends on earlier steps. + +| Step | Deliverable | Dependencies | Parallelizable | +| ---- | ---------------------------------------------------------- | -------------------------------------- | ------------------- | +| 1 | Skill consolidation (`.agents/`, `.claude/` cleanup) | None | -- | +| 2 | AGENTS.md restructure + STYLEGUIDE.md/DEVELOPMENT.md split | Step 1 (canonical paths settled) | -- | +| 3 | CONTRIBUTING.md overhaul | AGENTS.md (references it) | -- | +| 4 | README.md updates | CONTRIBUTING.md (references it) | -- | +| 5 | Issue templates | CONTRIBUTING.md (templates link to it) | Yes (with 6-9) | +| 6 | PR template | None | Yes (with 5, 7-9) | +| 7 | CODEOWNERS update | None | Yes (with 5-6, 8-9) | +| 8 | Label creation (via `gh label create`) | None | Yes (with 5-7, 9) | +| 9 | `architecture/` skeleton | None | Yes (with 5-8) | +| 10 | Skill template conformance updates | Issue/PR templates (steps 5-6) | -- | + + +Step 1 lands first so the docs can reference canonical paths truthfully. Steps 2-4 are sequential. Steps 5-9 are independent and can be parallelized. Step 10 depends on earlier steps. + +In practice, Steps 1-4 map cleanly to Phase 1 and Phase 2 PRs, while Steps 5-10 map to one or more Phase 3 PRs. Phase 4 should be planned separately before any implementation PRs are opened. --- @@ -374,8 +449,8 @@ Steps 1-3 are sequential. Steps 4-9 are independent and can be parallelized. Ste ## Open Questions -1. **STYLEGUIDE.md naming** — `STYLEGUIDE.md` vs `CODE_STYLE.md` vs `STYLE.md`? -2. **Issue template strictness** — Should agent diagnostic be required (gate) or optional (encouraged) on bug reports? OpenShell requires it. DataDesigner could start with required and relax if it creates too much friction. -3. **README tone** — How prominent should the agent-first messaging be? A line in the hero paragraph, or a dedicated section? -4. **CODEOWNERS granularity** — Keep catch-all or add file-specific ownership for docs/CI? -5. **Phase 2 timing** — Land foundation docs (Phase 1) first and iterate, or ship Phases 1-2 together? +1. **Reference doc naming** — `STYLEGUIDE.md` vs `CODE_STYLE.md` vs `STYLE.md`, and `DEVELOPMENT.md` vs `DEVELOPMENT_GUIDE.md`? +2. **README tone** — How prominent should the agent-first messaging be? A line in the hero paragraph, or a dedicated section? +3. **CODEOWNERS granularity** — Keep the explicit path callouts only, or introduce a distinct owner group for agent infra later? +4. `**development-task.yml` audience** — Keep it public for all contributors, or narrow it to maintainers/internal work? +5. **Symlink compatibility** — Do Claude/Codex harnesses handle symlinked skill directories reliably enough, or should we prefer mirrored directories plus drift checks? From f30f9b006de2ea5d39944c5dfa9bb3daa6ce3706 Mon Sep 17 00:00:00 2001 From: Nabin Mulepati Date: Tue, 17 Mar 2026 15:18:26 -0600 Subject: [PATCH 3/6] Apply suggestion from @greptile-apps[bot] Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --- plans/427/agent-first-development-plan.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/plans/427/agent-first-development-plan.md b/plans/427/agent-first-development-plan.md index d0d0eaa25..8c791a447 100644 --- a/plans/427/agent-first-development-plan.md +++ b/plans/427/agent-first-development-plan.md @@ -355,9 +355,7 @@ Create labels for workflow state: ### 3e. Update Skills to Conform to Templates Skills that create GitHub artifacts should produce output matching the new templates: - -`**create-pr`:** - +**`create-pr`:** - Produce PR descriptions matching the PR template structure (Summary / Related Issue / Changes / Testing / Checklist) - Include the testing checklist populated based on what was actually run From 878331a041ee9428fba40c118a88f4602b8e359b Mon Sep 17 00:00:00 2001 From: Nabin Mulepati Date: Tue, 17 Mar 2026 15:18:33 -0600 Subject: [PATCH 4/6] Apply suggestion from @greptile-apps[bot] Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --- plans/427/agent-first-development-plan.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/plans/427/agent-first-development-plan.md b/plans/427/agent-first-development-plan.md index 8c791a447..a412c5816 100644 --- a/plans/427/agent-first-development-plan.md +++ b/plans/427/agent-first-development-plan.md @@ -1,8 +1,10 @@ --- - -## date: 2026-03-17 +date: 2026-03-17 authors: - nmulepati +--- + +# Plan: Agent-Assisted Development Principles # Plan: Agent-Assisted Development Principles From 68b2b245147f1c01351ac532dfad3b5fe3d05036 Mon Sep 17 00:00:00 2001 From: Nabin Mulepati Date: Tue, 17 Mar 2026 15:21:05 -0600 Subject: [PATCH 5/6] fix double heading --- plans/427/agent-first-development-plan.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/plans/427/agent-first-development-plan.md b/plans/427/agent-first-development-plan.md index a412c5816..5b3bccf09 100644 --- a/plans/427/agent-first-development-plan.md +++ b/plans/427/agent-first-development-plan.md @@ -6,8 +6,6 @@ authors: # Plan: Agent-Assisted Development Principles -# Plan: Agent-Assisted Development Principles - ## Problem DataDesigner was built entirely by humans, and the codebase reflects that with strong architecture, comprehensive tests, and thoughtful design. We are now increasingly moving toward an agent-assisted planning and development workflow. The project already has meaningful agent-oriented infrastructure: seven skills, an agent introspection CLI, and supporting tooling. But a new contributor reading `README.md`, `CONTRIBUTING.md`, or `AGENTS.md` would not immediately discover that these workflows exist. The repository supports agent-assisted work, yet the top-level documentation still presents the project mostly as a conventional human-only codebase. From b8cf7dce06860c8a4a182ead2fd73e7f01ce72e2 Mon Sep 17 00:00:00 2001 From: Nabin Mulepati Date: Wed, 18 Mar 2026 15:55:41 -0600 Subject: [PATCH 6/6] docs: address review feedback on agent-first development plan Incorporate johnnygreco's review comments from PR #428: - Distinguish development tooling vs usage tooling throughout - Promote AGENTS.md restructure to Phase 0 (~50 lines target) - Remove skills inventory, workflows, and conventions from AGENTS.md scope - Remove new-sdg from skill categories (repo skills = development only) - Overhaul CONTRIBUTING.md toward plan-submission-via-issues workflow - Tone down README agent-first messaging to 1-2 sentences - Simplify CODEOWNERS to single maintainer group - Resolve 4 of 5 open questions per reviewer answers - Fix malformed markdown and Out of Scope contradiction - Add AGENTS.md redirect for dataset-building agents - Tag skills as development-scoped in metadata Made-with: Cursor --- plans/427/agent-first-development-plan.md | 216 +++++++++++----------- 1 file changed, 111 insertions(+), 105 deletions(-) diff --git a/plans/427/agent-first-development-plan.md b/plans/427/agent-first-development-plan.md index 5b3bccf09..92c7e3a9d 100644 --- a/plans/427/agent-first-development-plan.md +++ b/plans/427/agent-first-development-plan.md @@ -10,6 +10,13 @@ authors: DataDesigner was built entirely by humans, and the codebase reflects that with strong architecture, comprehensive tests, and thoughtful design. We are now increasingly moving toward an agent-assisted planning and development workflow. The project already has meaningful agent-oriented infrastructure: seven skills, an agent introspection CLI, and supporting tooling. But a new contributor reading `README.md`, `CONTRIBUTING.md`, or `AGENTS.md` would not immediately discover that these workflows exist. The repository supports agent-assisted work, yet the top-level documentation still presents the project mostly as a conventional human-only codebase. +This plan distinguishes two surfaces for agent tooling: + +- **Usage tooling** — skills and workflows that help end users build synthetic datasets with DataDesigner (e.g., the forthcoming official "build a dataset" skill, which runs outside the repo). +- **Development tooling** — skills and workflows that help contributors plan, implement, test, and review changes to DataDesigner itself (e.g., `search-docs`, `review-code`, `create-pr`). + +The implementation work in this plan focuses on the **development** surface, but the plan must also ensure agents can clearly distinguish between the two. An agent working inside the repo should understand it is contributing to DataDesigner, not using it to build datasets. An agent helping a user build a dataset should not be confused by development-oriented skills and guidance. + ## Inspiration This proposal draws strong inspiration from [NVIDIA/OpenShell](https://github.com/NVIDIA/OpenShell), which makes agent workflows and contributor guidance highly visible from the repository root. The goal is to bring those ideas into DataDesigner in a way that makes the project more agent-friendly while fitting its role as a public open-source Python library for synthetic data generation. Because DataDesigner serves a different audience, the resulting workflow should remain lighter-weight and more flexible than OpenShell's. @@ -21,7 +28,7 @@ This proposal draws strong inspiration from [NVIDIA/OpenShell](https://github.co | ----------------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | | README.md | Product-focused, usage-first | Zero mention of agent-first development | | CONTRIBUTING.md | Standard OSS contributing guide | Zero mention of skills or agent workflows | -| AGENTS.md | ~500 lines, mixes architecture, code style, and engineering workflow guidance | No skills inventory, no workflow chains | +| AGENTS.md | ~500 lines, mixes architecture, code style, and engineering workflow guidance | Bloated; mixes reference material with architectural invariants | | CLAUDE.md | 1 line: `@AGENTS.md` | Minimal | | Issue templates | 4 templates (bug, feature, dev task, config) | No agent investigation fields | | PR template | Doesn't exist | -- | @@ -38,14 +45,55 @@ This proposal draws strong inspiration from [NVIDIA/OpenShell](https://github.co 1. **Agents accelerate work; humans stay accountable.** Agents can speed up planning, implementation, and review, but people still make design decisions and own quality. 2. **Design intent should remain explicit.** The project should communicate that systems are deliberately engineered, with agents supporting the work rather than replacing architectural judgment. 3. **Encourage agent investigation without blocking real users.** Issue templates should normalize agent-assisted investigation, but contributors who cannot or did not use an agent still need a clear path to report bugs and propose features. -4. **Skills are part of the contributor surface area.** The future `.agents/skills/` directory should be treated as a maintained interface between the project and contributor agents. -5. **Top-level docs should advertise the workflow.** `README.md`, `CONTRIBUTING.md`, and `AGENTS.md` should make agent-assisted paths obvious to new contributors. +4. **Development and usage are separate surfaces.** An agent working inside the repo is a contributor; an agent helping a user build a dataset is a consumer. The repo's agent infrastructure (skills, `AGENTS.md`, `CONTRIBUTING.md`) serves the development surface. Usage tooling lives outside the repo. Both docs and skill metadata should make the boundary unambiguous so agents don't confuse the two contexts. +5. **README is for users; CONTRIBUTING.md is for contributors.** Agent-assisted development messaging belongs in `CONTRIBUTING.md` and `AGENTS.md`, not prominently in `README.md`. + +--- + +## Phase 0: AGENTS.md + +AGENTS.md is injected into every agent prompt. It must land first because every subsequent phase references the architectural invariants it establishes. It should also be the most stable file in the repo — if it changes often, something is wrong. + +**Current state:** ~500 lines. Mixes project overview, architecture, code style, development workflow, and testing guidance into one file. + +**Target state:** ~50 lines. Only high-level design decisions that are consequential for development. Content that changes when features ship (key files, registries, column types) or that duplicates tooling enforcement (code style, linter rules) does not belong here. + +**Target sections:** + +| Section | Content | +| ------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Identity | 3-4 sentences: what DataDesigner is, the "declare, don't orchestrate" contract, and the implication for every change. Must state that this file is for agents *developing* DataDesigner. If you are an agent helping a user *build a dataset*, see the product documentation and tutorials instead — not this file. | +| The Layering Is Structural | Three packages (config → engine → interface), what each owns, and the PEP 420 namespace package detail | +| Core Concepts | One-liner definitions: columns, samplers, seed datasets, processors, models, plugins | +| Core Design Principles | Declarative config vs. imperative engine, registries connect types to behavior, errors normalize at boundaries | +| Structural Invariants | Import direction, fast imports, no relative imports, typed code, follow established patterns, no untested code paths | +| Development | Four `make` targets only: `check-all-fix`, `test`, `update-license-headers`, `perf-import` | + +**What moves out:** + +- Code style, type annotations, import patterns, lazy loading, `TYPE_CHECKING`, naming conventions, common pitfalls → `STYLEGUIDE.md` +- Development workflow, testing commands, pre-commit, setup → `DEVELOPMENT.md` +- Key files list, column/model configuration details, registry system → removed (agents discover these via code search; they change too often to maintain in a static doc) + +**What is NOT added:** + +- Skills inventory — agent harnesses have built-in skill discovery and loading; duplicating the inventory in AGENTS.md creates a maintenance burden with no benefit +- Suggested workflows / skill sequences — these belong in skill files themselves or in `CONTRIBUTING.md` +- Issue and PR conventions — these belong in the templates and in `CONTRIBUTING.md` + +**CLAUDE.md remains:** + +``` +@AGENTS.md +``` + +This preserves Claude Code's native include mechanism and the ability to compose multiple files in the future (e.g., `@AGENTS.md` + `@STYLEGUIDE.md`). --- ## Phase 1: Skill & Agent Infrastructure -This phase lands first so the repository has a stable, tool-agnostic home for shared agent assets before the documentation starts pointing contributors at it. +This phase lands after AGENTS.md so the repository has a stable, tool-agnostic home for shared agent assets before the remaining documentation starts pointing contributors at it. ### 1a. Consolidate `.agents/` and `.claude/` @@ -93,6 +141,7 @@ This phase lands first so the repository has a stable, tool-agnostic home for sh - Verify all skill files reference `.agents/skills/` not `.claude/skills/` - Verify sub-agent references point to `.agents/agents/` - Ensure all cross-skill references use consistent naming +- Each skill's description or frontmatter should identify it as a **development** skill (e.g., "for contributors developing DataDesigner") so that agent harnesses with skill discovery can distinguish repo skills from usage skills --- @@ -102,83 +151,48 @@ This phase updates the contributor-facing docs after the agent-infrastructure pa ### 2a. README.md -**Current state:** Product-focused. Zero signal of agent-first development. - -**Target state:** Retains the product pitch while making it obvious that DataDesigner supports agent-assisted development. A new developer should quickly understand that the repo contains workflows and guidance their agent can use. - +**Current state:** Product-focused, usage-first. Designed for humans. Zero signal of agent-assisted development. -| Section | Action | -| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Hero / intro paragraph | Add a line signaling agent-first development alongside the product pitch | -| New: "Get Oriented with an Agent" | After quickstart. Show contributors how to clone the repo, point an agent at it, and use the repo guidance to answer questions quickly | -| New: "How Agent Workflows Fit In" | After the product sections. Explain how agent-assisted workflows support development here, and link to `AGENTS.md` for the authoritative skills inventory and workflow guidance | -| Contributing | Expand from current brief mention to set expectations: agent-first contributions, link to CONTRIBUTING.md | +**Target state:** Remains a product-focused document for users. Agent-assisted development gets a brief mention — a sentence or two near the development installation section — not a prominent hero section. The README is often the only documentation a DataDesigner *user* reads; it should not be dominated by contributor workflow details. The README should also help agents distinguish the two surfaces: users who want to *build datasets* should be directed toward usage documentation and the official usage skill (once available), not toward the in-repo development skills. +| Section | Action | +| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | +| Quickstart / usage | Keep product-focused. When usage tooling (e.g., the official "build a dataset" skill) ships, link to it here so user-facing agents find it naturally. | +| Development install | Add 1-2 sentences noting that the repo supports agent-assisted development and linking to `CONTRIBUTING.md` for the contributor workflow. | +| Contributing | Brief mention linking to `CONTRIBUTING.md`. No dedicated agent workflow sections in README itself. | -**Key language to establish:** - -- "DataDesigner supports agent-assisted planning, implementation, and review." -- "Before opening an issue, consider asking your coding agent to inspect the repository first." +**Key language:** Keep it minimal. Something like: "This repository supports agent-assisted development — see [CONTRIBUTING.md](CONTRIBUTING.md) for the recommended workflow." ### 2b. CONTRIBUTING.md **Current state:** Standard OSS contributing guide. Welcoming tone, good fork→develop→PR workflow, but zero agent awareness. -**Target state:** Agent-assisted contribution workflow is the recommended path. Human-only paths are fully supported but agent workflows are encouraged and well-documented. - - -| Section | Action | -| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| New: Opening philosophy | 2-3 sentences explaining that this project supports agent-assisted development and includes repo guidance and skills that make agents more effective contributors. | -| New: "Before You Open an Issue" | Recommended path. Checklist: (1) Clone the repo, (2) Point your agent at it, (3) Load relevant skills, (4) Have your agent diagnose/investigate. If the agent can't solve it, include the diagnostics. If you couldn't use an agent, say why and include the troubleshooting you already tried. | -| New: "Contributor Skill Map" | Short category summary plus a link to `AGENTS.md`, which remains the authoritative skill inventory | -| New: "Common Agent Workflows" | Document typical paths such as investigation → development, and future spike → build | -| Getting Started | Keep as-is (fork, clone, install) | -| Development Guide | Keep the contributor-facing summary, but link to `DEVELOPMENT.md` for detailed setup, testing, and day-to-day engineering workflow | -| Pull Requests | Update to reference the new PR template and the `create-pr` skill | -| New: "When to Open an Issue" | Clear guidance: real bugs your agent confirmed or you reproduced yourself with enough detail, feature proposals with design context, problems the `search-docs`/`search-github` skills couldn't resolve | -| New: "When NOT to Open an Issue" | Questions about how things work (agent can answer), configuration problems (agent can diagnose), "how do I..." requests (agent has skills for this) | -| Commit Messages / DCO | Keep as-is | - - -**Skill categories to summarize (keep `AGENTS.md` as the authoritative inventory):** - - -| Category | Skills | Purpose | -| --------------- | ---------------------------------- | ---------------------------------------------- | -| Getting Started | `search-docs`, `search-github` | Find information, check for duplicates | -| Data Generation | `new-sdg` | Design synthetic data generators interactively | -| Development | `commit`, `create-pr`, `update-pr` | Standard development cycle | -| Review | `review-code` | Multi-pass code review | +**Target state:** A complete overhaul that reflects how contributions actually happen now. The traditional fork→develop→PR workflow is being replaced by agent-assisted planning and development. CONTRIBUTING.md should guide contributors toward submitting plans via issues, using agents for investigation and implementation, and treating PRs as the output of an agent-assisted workflow rather than a purely manual one. +| Section | Action | +| -------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Opening philosophy | 2-3 sentences: this project uses agent-assisted development. Contributors are expected to use agents for investigation, planning, and implementation. The repo includes skills and guidance that make agents effective. | +| "How to Contribute" | Primary path: (1) Open an issue using the appropriate template, (2) Include agent investigation output, (3) For non-trivial changes, submit a plan in the issue for review before building, (4) Once approved, use agent-assisted development to implement. Link to `DEVELOPMENT.md` for setup. | +| "Before You Open an Issue" | Checklist: clone the repo, point your agent at it, have it search docs/issues. If the agent can't resolve it, include the diagnostics. If you didn't use an agent, include the troubleshooting you tried. | +| "When to Open an Issue" | Real bugs (reproduced or agent-confirmed), feature proposals with design context, problems that `search-docs`/`search-github` couldn't resolve. | +| "When NOT to Open an Issue" | Questions about how things work (agent can answer), configuration problems (agent can diagnose), "how do I..." requests. | +| Pull Requests | Reference the PR template and the `create-pr` skill. PRs should link to the issue they address. | +| Commit Messages / DCO | Keep as-is. | -### 2c. AGENTS.md -**Current state:** ~500 lines. Mixes project overview, architecture, code style, development workflow, and testing guidance into one file. No skills inventory, no workflow chains, no project identity statement. +CONTRIBUTING.md should open by clarifying the boundary: "The skills and workflows in this repository are for *developing* DataDesigner. If you're looking to *use* DataDesigner to build datasets, see the product documentation and the official usage skill (once available)." -**Target state:** The main onboarding document for agents working on this codebase. It should provide enough architectural and workflow context to make an agent effective, while moving code-authoring reference material to `STYLEGUIDE.md` and detailed development/testing workflow guidance to `DEVELOPMENT.md`. +**Repo skill categories (development only):** -| Section | Action | -| ----------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Opening | Keep the CONTRIBUTING.md reference. Add: "This file is the main onboarding surface for agents contributing to DataDesigner." | -| New: "Project Identity" | 3-4 sentences: DataDesigner is a synthetic-data project built by humans that now supports agent-assisted planning and development, while keeping human ownership over design decisions | -| New: "Skills" | Authoritative skill inventory. Note where skills live after consolidation, how harnesses discover them, and keep the full inventory here rather than duplicating it across README and CONTRIBUTING | -| New: "Suggested Workflows" | Document the common skill sequences and when to use them | -| Architecture | Keep existing 3-layer overview, key files, registries. Add brief component map. | -| New: "Issue and PR Conventions" | Reference the templates. When creating issues, use the template format. When creating PRs, use the PR template. Skills should produce output conforming to these templates. | -| Development Workflow | Move detailed setup and day-to-day engineering commands to `DEVELOPMENT.md`; keep only a short summary and link | -| Working Guidelines | Split the section: code-authoring guidance (license headers, `__future__` imports, comments) moves to `STYLEGUIDE.md`, while operational safety guidance stays in `AGENTS.md` | -| Testing | Move detailed test commands and "when to run what" guidance to `DEVELOPMENT.md`; keep only short expectations and link | -| New: "Security" | Don't commit secrets, don't run destructive operations without confirmation, scope changes to the issue at hand | -| Pre-commit | Move command details to `DEVELOPMENT.md`; keep only a brief mention in `AGENTS.md` if needed | -| Column/Model Configuration | Keep as-is (brief summaries) | -| Registry System | Keep as-is | -| **REMOVE** Code Style sections | Move to STYLEGUIDE.md (see 2d) | -| **MOVE** detailed workflow/testing reference sections | Move to `DEVELOPMENT.md` (see 2e) | +| Category | Skills | Purpose | +| ------------- | ---------------------------------- | -------------------------------------- | +| Investigation | `search-docs`, `search-github` | Find information, check for duplicates | +| Development | `commit`, `create-pr`, `update-pr` | Standard development cycle | +| Review | `review-code` | Multi-pass code review | -### 2d. STYLEGUIDE.md (new file) +### 2c. STYLEGUIDE.md (new file) Extract from AGENTS.md. Contains all code style reference material: @@ -192,17 +206,9 @@ Extract from AGENTS.md. Contains all code style reference material: - Code-authoring guidance from Working Guidelines: license headers, `from __future__ import annotations`, and comment expectations - Active linter rules (ruff rule reference) -**CLAUDE.md remains:** - -``` -@AGENTS.md -``` - -`AGENTS.md` should link to `STYLEGUIDE.md` and `DEVELOPMENT.md` when deeper reference material is needed. - **Why:** AGENTS.md is loaded into every agent conversation. Code style is reference material — needed when writing code, not when triaging issues or creating spikes. Splitting reduces context cost only if `STYLEGUIDE.md` is not loaded unconditionally. -### 2e. DEVELOPMENT.md (new file) +### 2d. DEVELOPMENT.md (new file) Collect from `AGENTS.md` and `CONTRIBUTING.md`. Contains development and testing reference material: @@ -216,7 +222,7 @@ Collect from `AGENTS.md` and `CONTRIBUTING.md`. Contains development and testing **Why:** Development workflow and testing guidance are operational reference material, not project identity or code style. Moving them out keeps `AGENTS.md` focused on onboarding and architecture, keeps `STYLEGUIDE.md` focused on how code should look, and keeps `CONTRIBUTING.md` concise. -### 2f. Create `architecture/` Directory (Skeleton) +### 2e. Create `architecture/` Directory (Skeleton) Create stub files for each major subsystem. Each stub lists section headings but doesn't contain full content yet. Docs are populated incrementally as features are built. @@ -292,7 +298,7 @@ Update existing `.github/ISSUE_TEMPLATE/` templates: **config.yml** updates: -- Keep the Discussions link, but update the copy to: "Have a question? Try pointing your agent at the repo first. It has skills for searching docs, finding issues, and more. See CONTRIBUTING.md for the workflow and AGENTS.md for the full skill inventory." +- Keep the Discussions link, but update the copy to: "Have a question? Try pointing your agent at the repo first — it can search docs, find issues, and more. See CONTRIBUTING.md for the recommended workflow." ### 3b. PR Template @@ -324,16 +330,10 @@ Intentionally lean. The `create-pr` and `review-code` skills already produce wel ### 3c. CODEOWNERS -Update `.github/CODEOWNERS` to explicitly call out agent-infrastructure ownership paths: +Keep the existing single-group ownership for now. All paths — including agent infrastructure — are owned by `@NVIDIA-NeMo/data_designer_reviewers`. Introduce a distinct agent infra owner group only if the need becomes clear later. ``` -# Broad ownership — core team reviews everything * @NVIDIA-NeMo/data_designer_reviewers - -# Agent infrastructure — explicit path callouts for visibility -.agents/ @NVIDIA-NeMo/data_designer_reviewers -AGENTS.md @NVIDIA-NeMo/data_designer_reviewers -STYLEGUIDE.md @NVIDIA-NeMo/data_designer_reviewers ``` ### 3d. Label Taxonomy @@ -359,7 +359,7 @@ Skills that create GitHub artifacts should produce output matching the new templ - Produce PR descriptions matching the PR template structure (Summary / Related Issue / Changes / Testing / Checklist) - Include the testing checklist populated based on what was actually run -`**review-code`:** +**`review-code`:** - When reviewing PRs created from templates, check that the template sections are properly filled @@ -404,8 +404,9 @@ Create `.github/workflows/issue-triage.yml`: Land this work as a sequence of incremental PRs rather than a single large rollout: +0. **Phase 0 PR** — `AGENTS.md` restructure (~50 lines). Lands first because it is injected into every agent prompt and every subsequent phase references it. 1. **Phase 1 PR(s)** — agent infrastructure consolidation and path cleanup. -2. **Phase 2 PR(s)** — documentation restructuring (`AGENTS.md`, `STYLEGUIDE.md`, `DEVELOPMENT.md`, `CONTRIBUTING.md`, `README.md`, and optional `architecture/` skeleton). +2. **Phase 2 PR(s)** — remaining documentation (`STYLEGUIDE.md`, `DEVELOPMENT.md`, `CONTRIBUTING.md`, `README.md`, and optional `architecture/` skeleton). 3. **Phase 3 PR(s)** — GitHub machinery such as templates, labels, and skill output conformance. 4. **Phase 4** — do not start implementation directly from this plan. Treat it as follow-on work that requires another planning pass, design review, and then its own incremental PRs. @@ -416,39 +417,44 @@ The default PR boundary should be the phase boundary. If a phase is still too la ## Execution Order -| Step | Deliverable | Dependencies | Parallelizable | -| ---- | ---------------------------------------------------------- | -------------------------------------- | ------------------- | -| 1 | Skill consolidation (`.agents/`, `.claude/` cleanup) | None | -- | -| 2 | AGENTS.md restructure + STYLEGUIDE.md/DEVELOPMENT.md split | Step 1 (canonical paths settled) | -- | -| 3 | CONTRIBUTING.md overhaul | AGENTS.md (references it) | -- | -| 4 | README.md updates | CONTRIBUTING.md (references it) | -- | -| 5 | Issue templates | CONTRIBUTING.md (templates link to it) | Yes (with 6-9) | -| 6 | PR template | None | Yes (with 5, 7-9) | -| 7 | CODEOWNERS update | None | Yes (with 5-6, 8-9) | -| 8 | Label creation (via `gh label create`) | None | Yes (with 5-7, 9) | -| 9 | `architecture/` skeleton | None | Yes (with 5-8) | -| 10 | Skill template conformance updates | Issue/PR templates (steps 5-6) | -- | +| Step | Deliverable | Dependencies | Parallelizable | +| ---- | ---------------------------------------------------------- | -------------------------------------- | -------------------- | +| 0 | AGENTS.md restructure (~50 lines) | None | -- | +| 1 | Skill consolidation (`.agents/`, `.claude/` cleanup) | Step 0 (AGENTS.md settled) | -- | +| 2 | STYLEGUIDE.md + DEVELOPMENT.md (extracted from old AGENTS.md) | Step 0 | -- | +| 3 | CONTRIBUTING.md overhaul | Step 0 (references it) | -- | +| 4 | README.md updates | CONTRIBUTING.md (references it) | -- | +| 5 | Issue templates | CONTRIBUTING.md (templates link to it) | Yes (with 6-9) | +| 6 | PR template | None | Yes (with 5, 7-9) | +| 7 | CODEOWNERS update | None | Yes (with 5-6, 8-9) | +| 8 | Label creation (via `gh label create`) | None | Yes (with 5-7, 9) | +| 9 | `architecture/` skeleton | None | Yes (with 5-8) | +| 10 | Skill template conformance updates | Issue/PR templates (steps 5-6) | -- | -Step 1 lands first so the docs can reference canonical paths truthfully. Steps 2-4 are sequential. Steps 5-9 are independent and can be parallelized. Step 10 depends on earlier steps. +Step 0 lands first because AGENTS.md is injected into every prompt and establishes the architectural vocabulary. Steps 1-4 are sequential. Steps 5-9 are independent and can be parallelized. Step 10 depends on earlier steps. -In practice, Steps 1-4 map cleanly to Phase 1 and Phase 2 PRs, while Steps 5-10 map to one or more Phase 3 PRs. Phase 4 should be planned separately before any implementation PRs are opened. +In practice, Step 0 is Phase 0, Steps 1-4 map to Phase 1 and Phase 2 PRs, and Steps 5-10 map to one or more Phase 3 PRs. Phase 4 should be planned separately before any implementation PRs are opened. --- ## Out of Scope -- **New skills** — the 7 existing skills are sufficient; this plan surfaces them +- **New skills in Phases 0–3** — the 7 existing development skills are sufficient for the work in this plan; Phase 4 captures future skill additions that require a separate planning pass - **LLM-powered issue triage** — deliberate choice to keep triage deterministic - **Vouch system** — defer until external contributor volume warrants it - **CI/CD changes** — existing workflows are solid - **Full architecture docs** — create skeleton only, populate incrementally - **Dependabot / Renovate** — dependency management automation (separate concern) +## Resolved Questions + +1. **Reference doc naming** — `STYLEGUIDE.md` and `DEVELOPMENT.md` (no suffix). +2. **README tone** — Minimal. A sentence or two near the development installation section. The README is for users, not contributors. +3. **CODEOWNERS granularity** — Keep it simple: all data designer maintainers for now. No separate agent infra group until the need is clear. +4. **Symlink compatibility** — Prefer symlinks. They've been used in popular repos and are cleaner than mirrored directories. If a harness doesn't resolve them, fall back to mirrored directories with a drift-check task. + ## Open Questions -1. **Reference doc naming** — `STYLEGUIDE.md` vs `CODE_STYLE.md` vs `STYLE.md`, and `DEVELOPMENT.md` vs `DEVELOPMENT_GUIDE.md`? -2. **README tone** — How prominent should the agent-first messaging be? A line in the hero paragraph, or a dedicated section? -3. **CODEOWNERS granularity** — Keep the explicit path callouts only, or introduce a distinct owner group for agent infra later? -4. `**development-task.yml` audience** — Keep it public for all contributors, or narrow it to maintainers/internal work? -5. **Symlink compatibility** — Do Claude/Codex harnesses handle symlinked skill directories reliably enough, or should we prefer mirrored directories plus drift checks? +1. **`development-task.yml` audience** — Keep it public for all contributors, or narrow it to maintainers/internal work? +2. **Development vs. dataset-building agents** — How should the repo handle an agent that is both developing DataDesigner and using it to build datasets in the same session (e.g., running a tutorial notebook to verify a change)? Should there be an explicit context-switching mechanism, or is the AGENTS.md redirect plus skill metadata sufficient?