From d3cf11054c84f097d366609f1d9ea1b70f8d8882 Mon Sep 17 00:00:00 2001 From: dgenio Date: Mon, 9 Mar 2026 10:44:00 +0000 Subject: [PATCH 1/2] docs: rewrite agent documentation system Replace monolithic AGENTS.md and copilot-instructions.md with a layered documentation architecture. Canonical shared layer (docs/agent-context/): - architecture.md: module boundaries, design traps, planned modules - workflows.md: dev commands, CI, code style, testing, PR conventions - invariants.md: 12 hard rules, forbidden patterns, safe/unsafe table - lessons-learned.md: 5 recurring mistake patterns, promotion criteria - review-checklist.md: definition-of-done checklist by category Projection layers: - .github/copilot-instructions.md: thin review-oriented wrapper - .github/instructions/chainweaver.instructions.md: scoped design traps - .github/instructions/tests.instructions.md: scoped test conventions - .claude/CLAUDE.md: Claude operational entrypoint and router AGENTS.md rewritten as primary entrypoint with 12 invariants, vocabulary table, executor/flow semantics, documentation map, and update policy. --- .claude/CLAUDE.md | 68 ++++++ .github/copilot-instructions.md | 140 ++++------- .../instructions/chainweaver.instructions.md | 15 ++ .github/instructions/tests.instructions.md | 27 ++ AGENTS.md | 230 ++++++++++++------ docs/agent-context/architecture.md | 95 ++++++++ docs/agent-context/invariants.md | 80 ++++++ docs/agent-context/lessons-learned.md | 103 ++++++++ docs/agent-context/review-checklist.md | 92 +++++++ docs/agent-context/workflows.md | 162 ++++++++++++ 10 files changed, 837 insertions(+), 175 deletions(-) create mode 100644 .claude/CLAUDE.md create mode 100644 .github/instructions/chainweaver.instructions.md create mode 100644 .github/instructions/tests.instructions.md create mode 100644 docs/agent-context/architecture.md create mode 100644 docs/agent-context/invariants.md create mode 100644 docs/agent-context/lessons-learned.md create mode 100644 docs/agent-context/review-checklist.md create mode 100644 docs/agent-context/workflows.md diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md new file mode 100644 index 0000000..b5401af --- /dev/null +++ b/.claude/CLAUDE.md @@ -0,0 +1,68 @@ +# ChainWeaver — Claude Instructions + +Canonical source of truth: [AGENTS.md](/AGENTS.md) and +[docs/agent-context/](/docs/agent-context/). + +Read AGENTS.md before starting any task. It contains the repo map, +invariants, entry points, common tasks, validation commands, and +documentation map that routes to deeper guidance. + +--- + +## Explore before acting + +- Read the canonical docs for the topic area before writing code. +- Inspect the files you plan to change. Do not assume structure from memory. +- Check [architecture.md](/docs/agent-context/architecture.md) for design + traps and reserved module names before creating or renaming files. +- Do not infer repo-wide rules from a single local example. + +## Implement safely + +- Preserve invariants. The three executor rules (no LLM, no network I/O, + no randomness in `executor.py`) are non-negotiable. See + [invariants.md](/docs/agent-context/invariants.md). +- Use authoritative commands exactly as listed in + [AGENTS.md § Validation commands](/AGENTS.md#7-validation-commands). + Do not substitute alternative flags, paths, or invocations. +- Follow the conventions in canonical docs. Do not invent new patterns. +- Do not "clean up" or "simplify" code that looks unusual without first + checking [architecture.md § Design traps](/docs/agent-context/architecture.md#design-traps). + +## Validate before completing + +- Run all four validation commands and confirm they pass. +- Check whether your change triggers a doc update. Consult the governance + triggers in [workflows.md](/docs/agent-context/workflows.md#documentation-governance-triggers). +- Walk [review-checklist.md](/docs/agent-context/review-checklist.md) before + marking work done. +- Verify that docstrings match actual behavior, not intended behavior. + +## Handle contradictions + +- If canonical docs contradict each other, flag the conflict explicitly. + Do not silently pick one side. +- If code contradicts canonical docs, trust the docs for conventions and + the code for runtime behavior. Flag the gap. +- If an older or duplicate document disagrees with AGENTS.md or + `docs/agent-context/`, prefer AGENTS.md. +- Fix small contradictions in the same PR. Open an issue for large ones. + +## Capture lessons + +- If you discover a recurring failure pattern during work, note it as a + candidate lesson. +- A candidate lesson is provisional. Do not promote it into durable docs + based on a single observation. +- A lesson is promotable when it is reusable, decision-shaping, and durable + — not just a one-off incident. +- Promotion order: canonical docs first (`lessons-learned.md`), then + projections. See the criteria in + [lessons-learned.md](/docs/agent-context/lessons-learned.md#promotion-criteria). + +## Update order + +1. Update canonical shared docs (`AGENTS.md`, `docs/agent-context/`) first. +2. Update tool-specific projections (this file, `.github/copilot-instructions.md`) second. +3. If a Claude-specific rule starts to look shared and durable, promote it + into canonical docs and simplify it here. diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 67a5ccf..c451645 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -1,98 +1,46 @@ # Copilot Instructions — ChainWeaver -These instructions apply to all Copilot interactions in this repository. -For full architecture context and decision rationale, see [AGENTS.md](/AGENTS.md). - -## Language & runtime - -- Python 3.10+ (target version for all code) -- `from __future__ import annotations` at the top of every module -- Type annotations on all function signatures (this is a `py.typed` package) - -## Code style - -- Formatter: `ruff format` (line length 99, double quotes, trailing commas) -- Linter: `ruff check` with rule sets: E, W, F, I, UP, B, SIM, RUF -- Import order: `isort`-compatible via Ruff's `I` rules (known first-party: `chainweaver`) -- Naming: snake_case for functions/variables, PascalCase for classes -- Docstrings: Google style (Args/Returns/Raises sections) - -## Architecture rules - -- All data models use `pydantic.BaseModel` (pydantic v2 API) -- All exceptions inherit from `ChainWeaverError` (in `chainweaver/exceptions.py`) -- All public symbols must be listed in `chainweaver/__init__.py` `__all__` -- `executor.py` is deterministic — no LLM calls, no network I/O, no randomness -- Tool functions: `fn(validated_input: BaseModel) -> dict[str, Any]` - -## Project layout - -``` -chainweaver/ → Package source (all modules use `from __future__ import annotations`) - __init__.py → Public API surface; all exports listed in __all__ - tools.py → Tool class: named callable with Pydantic input/output schemas - flow.py → FlowStep + Flow: ordered step definitions (Pydantic models) - registry.py → FlowRegistry: in-memory catalogue of named flows - executor.py → FlowExecutor: sequential, LLM-free runner (main entry point) - exceptions.py → Typed exception hierarchy (all inherit ChainWeaverError) - log_utils.py → Structured per-step logging utilities -pyproject.toml → Ruff, mypy, pytest config (source of truth for tool settings) -tests/ → pytest test suite - conftest.py → Shared fixtures (tools, flows, executors) - helpers.py → Shared Pydantic schemas and tool functions -examples/ → Runnable usage examples -.github/workflows/ → CI (ci.yml) and publish (publish.yml) pipelines -``` - -## Testing - -- Framework: `pytest` (no unittest) -- Test files: `tests/test_*.py` -- Use `@pytest.fixture()` for shared objects (tools, flows, executors) -- Shared schemas and helper functions live in `tests/helpers.py` -- Test both success and error paths -- Assertions: use plain `assert` (pytest rewrites them), not `self.assertEqual` -- No mocking of internal ChainWeaver classes unless testing integration boundaries - -## Validation commands (run before every commit/PR) - -```bash -# Install with dev dependencies -pip install -e ".[dev]" - -# Lint -ruff check chainweaver/ tests/ examples/ - -# Check formatting -ruff format --check chainweaver/ tests/ examples/ - -# Type check -python -m mypy chainweaver/ - -# Run tests -python -m pytest tests/ -v -``` - -Always run all four checks. CI runs lint + format + mypy on Python 3.10 only; -tests run across Python 3.10, 3.11, 3.12, 3.13. - -## PR conventions - -- One logical change per PR -- PR title: imperative mood (e.g., "Add retry logic to executor") -- If you change architecture (add/remove/rename modules), update AGENTS.md and the project layout in this file in the same PR -- If you change coding conventions, update this file in the same PR - -## Anti-patterns (never generate these) - -- Do NOT add LLM/AI client calls to `executor.py` -- Do NOT use `unittest.TestCase` — use plain pytest functions/classes -- Do NOT import from `chainweaver` internals using relative paths outside the package -- Do NOT add dependencies without updating `pyproject.toml` `[project.dependencies]` -- Do NOT commit secrets, API keys, or credentials - -## Trust these instructions - -These instructions are tested and aligned with CI. Only search for additional -context if the information here is incomplete or found to be in error. -For architecture decisions and rationale, see [AGENTS.md](/AGENTS.md). +> Thin review-oriented layer. Canonical source of truth: [AGENTS.md](/AGENTS.md) +> and [docs/agent-context/](/docs/agent-context/). + +--- + +## Review-critical rules + +- Review code and agent-facing docs together. If a PR changes behavior, + invariants, architecture, or workflows, the corresponding docs must be + updated in the same PR. +- Invariants take priority over cleanup, simplification, or local refactors. + See [AGENTS.md § Core invariants](/AGENTS.md#4-core-invariants). +- Do not invent conventions. All coding style, naming, workflow, and testing + rules are grounded in [AGENTS.md](/AGENTS.md) and + [docs/agent-context/](/docs/agent-context/). If guidance is missing, surface + the gap — do not guess. +- Use authoritative commands exactly as written in + [AGENTS.md § Validation commands](/AGENTS.md#7-validation-commands). Do not + substitute alternative flags, paths, or invocations. +- If you find a contradiction or stale content in any doc, flag it explicitly. + Do not silently work around it. + +## Executor guardrails + +`executor.py` has three hard invariants — no LLM calls, no network I/O, no +randomness. These are non-negotiable. See +[invariants.md](/docs/agent-context/invariants.md#hard-executor-invariants). + +## Vocabulary + +| Use | Never use | +|-----|-----------| +| **flow** | chain, pipeline | +| **tool** | function, action (when referring to a `Tool` instance) | + +## Where to find guidance + +| Topic | Canonical file | +|-------|----------------| +| Architecture, boundaries, design traps | [architecture.md](/docs/agent-context/architecture.md) | +| Commands, CI, code style, testing, PR rules | [workflows.md](/docs/agent-context/workflows.md) | +| Hard rules, forbidden patterns | [invariants.md](/docs/agent-context/invariants.md) | +| Recurring mistake patterns | [lessons-learned.md](/docs/agent-context/lessons-learned.md) | +| Definition-of-done, review gates | [review-checklist.md](/docs/agent-context/review-checklist.md) | diff --git a/.github/instructions/chainweaver.instructions.md b/.github/instructions/chainweaver.instructions.md new file mode 100644 index 0000000..202b5c3 --- /dev/null +++ b/.github/instructions/chainweaver.instructions.md @@ -0,0 +1,15 @@ +--- +applyTo: "chainweaver/**" +--- +# ChainWeaver package — design traps + +Do not "fix" these without a solution for the underlying constraint. +See [architecture.md § Design traps](/docs/agent-context/architecture.md#design-traps) +for full context. + +- `StepRecord` and `ExecutionResult` are `dataclass`, not Pydantic. They carry + `Exception` instances. Do not convert them. +- `log_utils.py` was renamed from `logging.py` to avoid stdlib shadowing. Do + not rename it back. +- Weaver Stack: do not add agent-kernel or weaver-spec imports to `executor.py`. + `KernelBackedExecutor` goes in a separate class. diff --git a/.github/instructions/tests.instructions.md b/.github/instructions/tests.instructions.md new file mode 100644 index 0000000..0809e2f --- /dev/null +++ b/.github/instructions/tests.instructions.md @@ -0,0 +1,27 @@ +--- +applyTo: "tests/**" +--- +# Tests + +## File boundary + +- `tests/helpers.py` — shared Pydantic schemas and tool functions. +- `tests/conftest.py` — pytest fixtures that compose objects from `helpers.py`. + +Do not merge these files. Do not put schemas in `conftest.py` or fixtures in +`helpers.py`. + +## Framework rules + +- pytest only. No `unittest.TestCase`, no `self.assertEqual`. +- Plain `assert` statements (pytest rewrites them). +- No mocking of internal ChainWeaver classes unless testing integration boundaries. +- Test both success and failure/error paths. + +## Organization + +- Unit tests grouped by module (`test_{module}.py`). +- Integration tests grouped by scenario. +- Test classes grouped by scenario (e.g., `TestSuccessfulExecution`). + +See [workflows.md § Testing conventions](/docs/agent-context/workflows.md#testing-conventions). diff --git a/AGENTS.md b/AGENTS.md index 6260704..a7e4f21 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,126 +1,198 @@ # ChainWeaver — Agent Instructions -> Recognized by GitHub Copilot (nearest-in-tree) and Claude Code as the -> authoritative project guidance file. +> Single source of truth for all coding agents working on this repository. +> For tool-specific wrappers, see the documentation map at the end of this file. --- -## 1. Project overview +## 1. Project identity -ChainWeaver is a deterministic orchestration layer for MCP-based agents. It -compiles multi-tool chains into executable flows that run without any LLM -involvement between steps. Python 3.10+, single runtime dependency -(`pydantic>=2.0`). +ChainWeaver is a deterministic orchestration layer for MCP-based agents. +It compiles multi-tool flows into executable sequences that run without any +LLM involvement between steps. + +- Python 3.10+; `from __future__ import annotations` in every module. +- Single runtime dependency: `pydantic>=2.0`. +- Core philosophy: **compiled, not interpreted** — the executor is a graph + runner, not a reasoning engine. --- -## 2. Architecture map +## 2. Domain vocabulary -``` +Use these terms consistently in code, docs, comments, and PR descriptions. + +| Canonical term | Never use | Meaning | +|----------------|-----------|---------| +| **flow** | chain, pipeline | A named, ordered sequence of tool invocations (`Flow`) | +| **tool** | function, action | A named callable with Pydantic input/output schemas (`Tool`) | + +--- + +## 3. Repository map + +```text chainweaver/ -├── __init__.py → Public API surface; all exports listed in __all__ -├── tools.py → Tool class: named callable with Pydantic input/output schemas -├── flow.py → FlowStep + Flow: ordered step definitions (Pydantic models) -├── registry.py → FlowRegistry: in-memory catalogue of named flows -├── executor.py → FlowExecutor: sequential, LLM-free runner (main entry point) -├── exceptions.py → Typed exception hierarchy (all inherit ChainWeaverError) -├── log_utils.py → Structured per-step logging utilities -└── py.typed → PEP 561 marker for typed package +├── __init__.py Public API surface; all exports in __all__ +├── tools.py Tool class: named callable with Pydantic I/O schemas +├── flow.py FlowStep + Flow: ordered step definitions (Pydantic models) +├── registry.py FlowRegistry: in-memory catalogue of named flows +├── executor.py FlowExecutor: sequential, LLM-free runner (main entry point) +├── exceptions.py Typed exception hierarchy (all inherit ChainWeaverError) +├── log_utils.py Structured per-step logging utilities +└── py.typed PEP 561 marker +tests/ +├── conftest.py Pytest fixtures (import schemas/functions from helpers.py) +├── helpers.py Shared Pydantic schemas and tool functions +├── test_*.py Test files +examples/ +└── simple_linear_flow.py Runnable standalone usage example +pyproject.toml Ruff, mypy, pytest config (source of truth for tooling) +.github/workflows/ CI (ci.yml) and publish (publish.yml) pipelines ``` +### Key entry points + +- `FlowExecutor.execute_flow(flow_name, initial_input)` → `ExecutionResult` +- `FlowRegistry.register_flow(flow, *, overwrite=False)` → register a flow +- `FlowExecutor.register_tool(tool)` → register a tool for use in flows + --- -## 3. Key entry points +## 4. Core invariants -- `FlowExecutor.execute_flow(flow_name, initial_input)` — main orchestration - entry point; returns `ExecutionResult` -- `FlowRegistry.register_flow(flow)` — register a flow for execution -- `FlowExecutor.register_tool(tool)` — register a tool for use in flows +Three hard executor invariants and nine package-wide invariants govern all +changes. The executor is deterministic by design. ---- +**Executor — never add to `executor.py`:** +1. No LLM or AI client calls. +2. No network I/O. +3. No randomness. -## 4. Decision context +**Package-wide:** +4. All exceptions inherit from `ChainWeaverError` with relevant context + attributes (`tool_name`, `step_index`, `detail` where applicable). +5. All public symbols exported in `chainweaver/__init__.py` `__all__`. +6. Tool function signature: `fn(validated_input: BaseModel) -> dict[str, Any]`. +7. `from __future__ import annotations` at the top of every module. +8. Type annotations on all function signatures (package ships `py.typed`). +9. Pydantic `BaseModel` for all data schemas (`Flow`, `FlowStep`, I/O contracts). +10. No secrets, credentials, or PII in code, logs, or tests. +11. All new code must pass: `ruff check`, `ruff format --check`, `mypy`, `pytest`. +12. One logical change per PR; all tests must pass before merge. -| Decision | Rationale | -|---|---| -| **Sequential-only execution** | Phase 1 MVP. DAG execution is planned for v0.2 (see Roadmap in README). | -| **Pydantic for all schemas** | Schema validation ensures deterministic I/O contracts between steps. Every tool input/output is validated. | -| **No LLM calls in executor** | Core design principle — "compiled, not interpreted." The executor is a graph runner, not a reasoning engine. | -| **`from __future__ import annotations`** | Every module uses it for forward-reference support and cleaner type hints. | +For the full prohibited-actions list and anti-patterns, see +[invariants.md](docs/agent-context/invariants.md). --- -## 5. Top invariants +## 5. Executor and flow semantics + +### `FlowStep.input_mapping` + +| Value type | Behavior | +|------------|----------| +| `str` | Looked up as a key in the accumulated execution context. | +| Non-string (`int`, `float`, `bool`, …) | Used as a literal constant. | +| Empty `{}` (default) | The tool receives the full current context. | + +### `ExecutionResult` (dataclass) + +| Field | Type | Meaning | +|-------|------|---------| +| `flow_name` | `str` | Name of the executed flow. | +| `success` | `bool` | `True` when all steps completed without error. | +| `final_output` | `dict \| None` | Merged execution context, or `None` on failure. | +| `execution_log` | `list[StepRecord]` | Ordered per-step records. | + +### `StepRecord` (dataclass) -1. **No LLM calls in `executor.py`** — the executor is deterministic by design. -2. All exceptions inherit from `ChainWeaverError` and carry relevant context attributes (e.g. `tool_name`, `step_index`, `detail` where applicable). -3. All public API symbols must be exported in `chainweaver/__init__.py` `__all__`. -4. Every tool function signature: `fn(validated_input: BaseModel) -> dict[str, Any]`. -5. `from __future__ import annotations` at the top of every module. -6. Type annotations on all function signatures (the package ships `py.typed`). -7. Pydantic `BaseModel` for all data schemas (`Flow`, `FlowStep`, input/output contracts). -8. No secrets, credentials, or PII in code, logs, or tests. -9. All new code must pass: `pytest`, `ruff check`, `ruff format --check`. -10. One logical change per PR; all tests must pass before merge. +| Field | Type | Meaning | +|-------|------|---------| +| `step_index` | `int` | Zero-based position (`-1` = flow-input validation, `len(steps)` = flow-output validation). | +| `tool_name` | `str` | Tool invoked (or flow name for validation records). | +| `inputs` | `dict` | Validated inputs passed to the tool. | +| `outputs` | `dict \| None` | Validated outputs, or `None` on failure. | +| `error` | `Exception \| None` | Exception raised, or `None` on success. | +| `success` | `bool` | `True` when the step completed without error. | + +> **Design note:** `StepRecord` and `ExecutionResult` are intentionally +> `dataclass`, not `BaseModel`. They carry `Exception` instances that Pydantic +> cannot serialize. See [architecture.md § Design traps](docs/agent-context/architecture.md#design-traps). --- -## 6. Development workflow +## 6. Common tasks -```bash -# Install with dev dependencies -pip install -e ".[dev]" +| Task | Where to look | What to update | +|------|---------------|----------------| +| Add a new tool | `tools.py` | Integration tests in `test_flow_execution.py` | +| Add a new exception | `exceptions.py` | `__init__.py` + `__all__` + README error table — **same PR** | +| Modify flow execution | `executor.py` | Keep `StepRecord` + `ExecutionResult` consistent | +| Add a new Flow field | `flow.py` | Serialization tests if `model_dump()` changes | +| Change logging format | `log_utils.py` | Update tests (no re-export needed) | +| Add a new module | See [new-module checklist](docs/agent-context/workflows.md#new-module-checklist) | -# Run tests -pytest +### Exception message style -# Lint -ruff check chainweaver/ tests/ examples/ +Use f-string sentences with single-quoted identifiers, ending with a period: -# Check formatting -ruff format --check chainweaver/ tests/ examples/ +```python +f"Tool '{tool_name}' is not registered." +``` + +--- -# Run the example -python examples/simple_linear_flow.py +## 7. Validation commands + +Run all four before every commit and PR: + +```bash +ruff check chainweaver/ tests/ examples/ +ruff format --check chainweaver/ tests/ examples/ +python -m mypy chainweaver/ +python -m pytest tests/ -v ``` -See [README > Development](README.md#development) for extended context. +CI runs lint + format + mypy on Python 3.10 only; tests run across 3.10–3.13. -> **Note:** Once [#39](https://github.com/dgenio/ChainWeaver/issues/39) (mypy) -> lands, add `python -m mypy chainweaver/` to the validation sequence. +For full CI, PR, branch, and commit conventions, see +[workflows.md](docs/agent-context/workflows.md). --- -## 7. Common tasks +## 8. Definition of done -| Task | Where to look | What to update | -|---|---|---| -| Add a new Tool | `chainweaver/tools.py` | Integration tests in `tests/test_flow_execution.py` | -| Add a new exception | `chainweaver/exceptions.py` | Re-export in `chainweaver/__init__.py`, update `__all__` | -| Modify flow execution | `chainweaver/executor.py` | Ensure `StepRecord` and `ExecutionResult` stay consistent | -| Add a new Flow field | `chainweaver/flow.py` | Update serialization tests if `model_dump()` changes | -| Change logging format | `chainweaver/log_utils.py` | No re-export needed; update tests | +Before marking a PR ready for review: + +- [ ] All four validation commands pass locally. +- [ ] Both success and error paths are tested. +- [ ] `__init__.py` `__all__` is updated if public symbols were added. +- [ ] No new contradictions introduced between docs. +- [ ] AGENTS.md updated if architecture changed. -> **Ownership rule:** If you change the architecture, update this file in the -> same PR. +Full checklist: [review-checklist.md](docs/agent-context/review-checklist.md). --- -## 8. Testing conventions +## 9. Documentation map -- Test files: `tests/test_*.py` -- Test classes grouped by scenario (e.g., `TestSuccessfulExecution`, `TestMissingTool`) -- Use `@pytest.fixture()` for shared objects (tools, flows, executors) -- Shared fixtures and schemas live in `tests/conftest.py` -- Test both success and failure paths -- See [README > Development](README.md#development) for commands +| File | Purpose | Consult when… | +|------|---------|---------------| +| [architecture.md](docs/agent-context/architecture.md) | Boundaries, decisions, design traps, planned modules | Scoping changes, understanding why something is built a certain way, choosing file placement | +| [workflows.md](docs/agent-context/workflows.md) | Commands, CI, code style, testing, PR/git conventions | Writing code, creating branches/PRs, adding modules, running CI | +| [invariants.md](docs/agent-context/invariants.md) | Hard rules, forbidden patterns | Modifying core modules, adding deps, touching executor | +| [lessons-learned.md](docs/agent-context/lessons-learned.md) | Recurring mistake patterns | Before proposing changes to avoid known pitfalls | +| [review-checklist.md](docs/agent-context/review-checklist.md) | Definition-of-done, review gates | Before submitting a PR, during code review | --- -## 9. CI pipeline +## 10. Update policy -- `.github/workflows/ci.yml`: runs on push/PR to `main` - - Ruff lint + format check (Python 3.10 only) - - `pytest` across Python 3.10, 3.11, 3.12, 3.13 -- `.github/workflows/publish.yml`: triggered by `v*` tags → - test → build → PyPI publish → GitHub Release +- **Every PR:** check whether AGENTS.md or any `docs/agent-context/` file is + stale with respect to the change. Update in the same PR if so. +- **Architecture changes** (add/remove/rename modules): update AGENTS.md repo + map and architecture.md in the same PR. +- **Ownership rule:** if you change the architecture, you own the doc update. +- **Contradictions:** if you find a contradiction between docs, fix it in the + same PR if small, or open an issue if large. diff --git a/docs/agent-context/architecture.md b/docs/agent-context/architecture.md new file mode 100644 index 0000000..156a81c --- /dev/null +++ b/docs/agent-context/architecture.md @@ -0,0 +1,95 @@ +# Architecture + +> Canonical reference for ChainWeaver's architectural intent, major boundaries, +> and design decisions. Consult this before scoping changes or choosing where +> new code belongs. + +--- + +## Architectural intent + +ChainWeaver is a **deterministic graph runner**. It compiles ordered sequences +of tool invocations into flows and executes them with strict schema validation +at every boundary. No LLM, no network I/O, no randomness enters the executor. + +The entire value proposition rests on this determinism: given the same input +and tools, the same flow produces the same output every time. + +--- + +## Module boundaries + +| Module | Responsibility | Key constraint | +|--------|---------------|----------------| +| `tools.py` | Define `Tool`: name + callable + Pydantic I/O schemas | Tool functions must be `fn(BaseModel) -> dict[str, Any]` | +| `flow.py` | Define `FlowStep` and `Flow` as Pydantic models | Pure data definitions; no execution logic | +| `registry.py` | Store and retrieve flows by name | In-memory; intentionally simple for later wrapping | +| `executor.py` | Run flows step-by-step, validate I/O, merge context | **No LLM, no network I/O, no randomness** | +| `exceptions.py` | Typed exception hierarchy | All inherit `ChainWeaverError`; carry context attrs | +| `log_utils.py` | Per-step structured logging | Library-safe (NullHandler only); no handler config | +| `__init__.py` | Public API surface | Every public symbol must be in `__all__` | + +--- + +## Decision context + +| Decision | Rationale | +|----------|-----------| +| Sequential-only execution | Phase 1 MVP. DAG execution is planned for v0.2. | +| Pydantic for all schemas | Deterministic I/O contracts between steps. | +| No LLM calls in executor | "Compiled, not interpreted." | +| `from __future__ import annotations` | Forward-reference support; cleaner type hints. | +| `dataclass` for `StepRecord`/`ExecutionResult` | They carry `Exception` instances; Pydantic cannot serialize these. | + +--- + +## Design traps + +Things that look wrong but are intentional. Do not "fix" these without a +solution for the underlying constraint. + +### `StepRecord` and `ExecutionResult` are dataclasses, not Pydantic + +The `error` field holds an `Exception` instance. Pydantic's serialization +cannot handle arbitrary exception objects. These may migrate to Pydantic if a +serialization solution is found, but until then agents must not convert them. + +### `log_utils.py`, not `logging.py` + +Renamed from `logging.py` (commit ccfe7f8) to avoid shadowing Python's `logging` +stdlib module. Do not rename it back. + +### `tests/helpers.py` is separate from `tests/conftest.py` + +Extracted intentionally (commit 7ef3245). Boundary: +- `helpers.py` → shared Pydantic schemas and tool functions (importable by any test) +- `conftest.py` → pytest fixtures that compose objects from `helpers.py` + +Do not merge them back together. + +--- + +## Planned modules + +The following module names are reserved for planned features. Do not create +files that conflict with these names: + +| Reserved name | Issue | Purpose | +|---------------|-------|---------| +| `compiler.py` | #71 | Compile-time schema chain validation | +| `analyzer.py` | #77 | Offline chain analyzer | +| `observer.py` | #78 | Runtime chain observer | +| `compat.py` | #48 | Schema fingerprinting | +| `viz.py` | #79 | Flow visualization | +| `cli.py` | #44 | CLI interface | +| `mcp/` | #70, #72 | MCP adapter + flow server | +| `integrations/` | #82 | LangChain/LlamaIndex bridge adapters | +| `export/` | #25 | Flow export formats | +| `governance.py` | #13 | Governance policies | + +### Weaver Stack guardrail + +Issues #89–#91 introduce a kernel-backed executor (`KernelBackedExecutor`) that +delegates step execution to an agent-kernel. This is a **separate class** — +do not add agent-kernel or weaver-spec imports to `executor.py`. The core +`FlowExecutor` stays deterministic and standalone. diff --git a/docs/agent-context/invariants.md b/docs/agent-context/invariants.md new file mode 100644 index 0000000..711f440 --- /dev/null +++ b/docs/agent-context/invariants.md @@ -0,0 +1,80 @@ +# Invariants + +> The strongest "do not break these assumptions" reference. Consult this when +> modifying core modules, adding dependencies, or touching the executor. + +--- + +## Hard executor invariants + +These three rules are foundational to ChainWeaver's value proposition. +They are non-negotiable. + +| # | Rule | Why | +|---|------|-----| +| 1 | **No LLM or AI client calls** in `executor.py` | The executor is deterministic. Same input + same tools = same output. | +| 2 | **No network I/O** in `executor.py` | Network I/O belongs in tool functions, not the orchestrator. | +| 3 | **No randomness** in `executor.py` | Random routing or jitter would break the "compiled, not interpreted" guarantee. | + +Network I/O and randomness are allowed in **tool functions** — the executor +only manages the data flow between tools. + +--- + +## Package-wide invariants + +| # | Rule | +|---|------| +| 4 | All exceptions inherit from `ChainWeaverError` with context attrs. | +| 5 | All public symbols in `__init__.py` `__all__`. | +| 6 | Tool signature: `fn(validated_input: BaseModel) -> dict[str, Any]`. | +| 7 | `from __future__ import annotations` in every module. | +| 8 | Type annotations on all function signatures (`py.typed` package). | +| 9 | Pydantic `BaseModel` for all data schemas. | +| 10 | No secrets, credentials, or PII in code, logs, or tests. | +| 11 | All new code must pass: `ruff check`, `ruff format --check`, `mypy`, `pytest`. | +| 12 | One logical change per PR; all tests must pass before merge. | + +--- + +## Forbidden patterns + +Never generate these in ChainWeaver code: + +| Pattern | Why | +|---------|-----| +| LLM/AI client calls in `executor.py` | Violates invariant 1 | +| `unittest.TestCase` | Use plain pytest functions/classes | +| Relative imports from `chainweaver` internals outside the package | Breaks package boundaries | +| Adding deps without updating `pyproject.toml` `[project.dependencies]` | Invisible dependency | +| Secrets, API keys, or credentials in code | Security invariant | +| Converting `StepRecord`/`ExecutionResult` to Pydantic `BaseModel` | They carry `Exception`; see [architecture.md § Design traps](architecture.md#design-traps) | +| Renaming `log_utils.py` back to `logging.py` | Stdlib shadowing; see [architecture.md § Design traps](architecture.md#design-traps) | +| Merging `tests/helpers.py` into `conftest.py` | Intentional split; see [architecture.md § Design traps](architecture.md#design-traps) | +| Adding agent-kernel or weaver-spec imports to `executor.py` | Weaver Stack goes in `KernelBackedExecutor`; see [architecture.md § Weaver Stack](architecture.md#weaver-stack-guardrail) | +| Adding deps to `executor.py` that conflict with kernel delegation | Future `KernelBackedExecutor` requires a clean executor | + +--- + +## Safe vs. unsafe simplifications + +| Change | Safe? | Notes | +|--------|-------|-------| +| Extract a helper function within a module | ✅ Yes | Keep it private (`_name`) unless it's a public API | +| Refactor tests to use shared fixtures | ✅ Yes | Put new schemas in `helpers.py`, fixtures in `conftest.py` | +| Remove an unused import | ✅ Yes | Ruff already flags these | +| Inline a private helper | ✅ Yes | If it reduces complexity | +| Convert `StepRecord`/`ExecutionResult` to Pydantic | ❌ No | See forbidden patterns | +| Add a new field to `Flow` or `FlowStep` | ⚠️ Careful | Check `model_dump()` serialization; update tests | +| Change exception hierarchy | ⚠️ Careful | May break downstream `except` clauses | +| Add network I/O to executor.py | ❌ No | Hard invariant | + +--- + +## Update triggers + +Update this file when: +- A new hard invariant is established. +- A new forbidden pattern is discovered. +- An invariant is relaxed or removed (document why). +- A new "safe vs. unsafe" category is identified. diff --git a/docs/agent-context/lessons-learned.md b/docs/agent-context/lessons-learned.md new file mode 100644 index 0000000..cd79ee6 --- /dev/null +++ b/docs/agent-context/lessons-learned.md @@ -0,0 +1,103 @@ +# Lessons Learned + +> Reusable patterns from past mistakes. Not an incident archive — only +> generalized, durable lessons belong here. + +--- + +## Failure-capture workflow + +When a PR review or CI failure reveals a recurring mistake pattern: + +1. **Identify the generalized lesson.** Strip project-specific details. + Ask: "Would a different agent make the same mistake on a different task?" +2. **Check whether a lesson already exists** in this file. If so, refine it + rather than duplicating. +3. **Write a new entry** if the pattern is genuinely new. Use the format below. +4. **Consider promoting to invariants.md** if the lesson represents a rule that + should never be violated (rather than a common mistake to watch for). + +### What belongs here + +- Recurring mistakes agents make, generalized into actionable guidance. +- Patterns observed across multiple PRs or multiple agents. + +### What does NOT belong here + +- One-off bugs or typos. +- Incident narratives or timelines. +- Guidance already captured as an invariant or forbidden pattern in + [invariants.md](invariants.md). + +--- + +## Recurring mistake patterns + +### 1. Docstrings that don't match actual behavior + +**Pattern:** Agent writes or updates a docstring that describes intended +behavior rather than actual behavior (e.g., claiming a field is immutable +when the dataclass isn't frozen, or documenting exceptions as raised when +they are actually caught and returned via `ExecutionResult`). + +**Prevention:** After writing or modifying a docstring, verify each claim +against the implementation. Check: return types, raised vs. caught exceptions, +mutability, field semantics. + +--- + +### 2. Referencing files or configs that don't exist + +**Pattern:** Agent mentions a file, config key, or test module in docs or +code that doesn't exist in the repository (e.g., `tests/test_tools.py`, +an isort config before it was added). + +**Prevention:** Before referencing any file or config in prose, verify it +exists. Use the repository map in AGENTS.md or check the file system directly. + +--- + +### 3. Commands that don't match CI exactly + +**Pattern:** Agent includes shell commands in docs or scripts that differ +from CI (e.g., `ruff check .` instead of `ruff check chainweaver/ tests/ examples/`, +omitting `python -m` prefix, using different flags). + +**Prevention:** Copy commands from the authoritative sequence in +[workflows.md § Validation commands](workflows.md#validation-commands). +Never improvise command variations. + +--- + +### 4. Markdown formatting errors in agent-generated docs + +**Pattern:** Agent produces invalid Markdown syntax — `|>` instead of `>` +for blockquotes, `||` creating phantom table columns, broken link syntax. + +**Prevention:** Review generated Markdown for syntax correctness before +committing. Validate tables have consistent column counts. + +--- + +### 5. Overclaiming capabilities or properties + +**Pattern:** Agent asserts a property that is aspirational rather than actual +(e.g., claiming immutability without `frozen=True`, claiming all exceptions +carry `step_index` when some only carry `name`). + +**Prevention:** Verify assertions against the code. If a property isn't +enforced at the language level, don't claim it. + +--- + +## Promotion criteria + +A lesson should be **promoted to invariants.md** when: +- It represents a hard rule, not just a common mistake. +- Violating it would cause CI failure, runtime error, or architectural damage. +- It applies unconditionally, not just "in most cases." + +A lesson should be **removed** when: +- The underlying cause has been eliminated (e.g., a tool fix makes the + mistake impossible). +- It has been superseded by a more specific or more general lesson. diff --git a/docs/agent-context/review-checklist.md b/docs/agent-context/review-checklist.md new file mode 100644 index 0000000..0bf3987 --- /dev/null +++ b/docs/agent-context/review-checklist.md @@ -0,0 +1,92 @@ +# Review Checklist + +> Definition-of-done checks for agent self-review and maintainer review. +> Use this before marking a PR ready. + +--- + +## CI and validation + +- [ ] `ruff check chainweaver/ tests/ examples/` passes. +- [ ] `ruff format --check chainweaver/ tests/ examples/` passes. +- [ ] `python -m mypy chainweaver/` passes. +- [ ] `python -m pytest tests/ -v` passes. +- [ ] Commands match the authoritative sequence exactly (see [workflows.md](workflows.md#validation-commands)). + +--- + +## Code correctness + +- [ ] New code has type annotations on all function signatures. +- [ ] New modules start with `from __future__ import annotations`. +- [ ] Tool functions follow the signature: `fn(validated_input: BaseModel) -> dict[str, Any]`. +- [ ] Exception messages use f-string style with single-quoted identifiers, ending with a period. +- [ ] No `unittest.TestCase` — plain pytest functions/classes only. +- [ ] No relative imports from `chainweaver` internals outside the package. + +--- + +## Testing + +- [ ] Both success and error/failure paths are tested. +- [ ] New schemas added to `tests/helpers.py` (not `conftest.py`). +- [ ] New fixtures added to `tests/conftest.py` (not `helpers.py`). +- [ ] Assertions use plain `assert`, not `self.assertEqual`. +- [ ] No mocking of internal ChainWeaver classes (unless at integration boundary). + +--- + +## Public API + +- [ ] New public symbols added to `chainweaver/__init__.py` `__all__`. +- [ ] New exceptions: `__init__.py` + `__all__` + README error table — all updated. +- [ ] `StepRecord` / `ExecutionResult` remain as dataclasses (not converted to Pydantic). + +--- + +## Architecture + +- [ ] No LLM calls, network I/O, or randomness added to `executor.py`. +- [ ] No new dependencies added without updating `pyproject.toml`. +- [ ] New module name does not conflict with [reserved names](architecture.md#planned-modules). +- [ ] No agent-kernel or weaver-spec imports in `executor.py`. + +--- + +## Documentation consistency + +- [ ] AGENTS.md repo map updated if modules were added/removed/renamed. +- [ ] `architecture.md` module boundaries updated if architecture changed. +- [ ] `workflows.md` updated if commands, CI, or conventions changed. +- [ ] README error table updated if exceptions were added. +- [ ] No docstrings that claim behavior the code doesn't implement. + (See [lessons-learned.md § pattern 1](lessons-learned.md#1-docstrings-that-dont-match-actual-behavior).) +- [ ] No references to files or configs that don't exist. + (See [lessons-learned.md § pattern 2](lessons-learned.md#2-referencing-files-or-configs-that-dont-exist).) + +--- + +## Domain vocabulary + +- [ ] Uses "flow" (never "chain" or "pipeline"). +- [ ] Uses "tool" (never "function" or "action" for Tool instances). + +--- + +## PR hygiene + +- [ ] One logical change per PR. +- [ ] PR title in imperative mood. +- [ ] Branch follows `{type}/{issue_number}-{short-description}` convention. +- [ ] Commits follow Conventional Commits format. +- [ ] No secrets, API keys, or credentials. + +--- + +## Update triggers + +Update this checklist when: +- New review gates are established (e.g., new invariants). +- Existing checks are found to be insufficient or redundant. +- New recurring mistakes are added to `lessons-learned.md` that warrant + a corresponding checklist item. diff --git a/docs/agent-context/workflows.md b/docs/agent-context/workflows.md new file mode 100644 index 0000000..6c71ab9 --- /dev/null +++ b/docs/agent-context/workflows.md @@ -0,0 +1,162 @@ +# Workflows + +> Canonical reference for development commands, CI, code style, testing +> conventions, PR/git rules, and documentation governance triggers. + +--- + +## Validation commands + +Run all four before every commit and PR. This is the authoritative sequence: + +```bash +# 1. Lint +ruff check chainweaver/ tests/ examples/ + +# 2. Format check +ruff format --check chainweaver/ tests/ examples/ + +# 3. Type check +python -m mypy chainweaver/ + +# 4. Tests +python -m pytest tests/ -v +``` + +**Command-selection rules:** +- Always scope to `chainweaver/ tests/ examples/` — never use bare `.` or `src/`. +- Always use `python -m pytest`, not bare `pytest`, for consistent module resolution. +- Always use `python -m mypy`, not bare `mypy`. + +--- + +## CI pipeline + +| Workflow | Trigger | Steps | +|----------|---------|-------| +| `ci.yml` | Push/PR to `main` | Ruff lint + format + mypy (Python 3.10 only); pytest across 3.10, 3.11, 3.12, 3.13 | +| `publish.yml` | `v*` tags | Test → build → PyPI publish → GitHub Release | + +--- + +## Code style + +- **Formatter:** `ruff format` — line length 99, double quotes, trailing commas. +- **Linter:** `ruff check` — rule sets: E, W, F, I, UP, B, SIM, RUF. +- **Import order:** isort-compatible via Ruff's `I` rules (known first-party: `chainweaver`). +- **Naming:** `snake_case` for functions/variables, `PascalCase` for classes. +- **Docstrings:** Google style (Args/Returns/Raises sections). +- **Exception messages:** f-string sentences, single-quoted identifiers, end with a period. + ```python + f"Tool '{tool_name}' is not registered." + ``` + +--- + +## Testing conventions + +- **Framework:** pytest only. No `unittest.TestCase`. +- **Test files:** `tests/test_*.py`. +- **Shared artifacts boundary:** + - `tests/helpers.py` — Pydantic schemas and tool functions. + - `tests/conftest.py` — pytest fixtures that compose objects from `helpers.py`. +- **Organization:** hybrid — unit tests grouped by module, integration tests grouped by scenario. +- **Test classes:** grouped by scenario (e.g., `TestSuccessfulExecution`, `TestMissingTool`). +- **Assertions:** plain `assert` (pytest rewrites them). Not `self.assertEqual`. +- **Mocking:** no mocking of internal ChainWeaver classes unless testing integration boundaries. +- **Coverage:** test both success and failure/error paths. + +--- + +## PR conventions + +- One logical change per PR. +- PR title: imperative mood (e.g., "Add retry logic to executor"). +- Architecture changes → update AGENTS.md repo map + `architecture.md` in the same PR. +- Coding convention changes → update this file in the same PR. + +--- + +## Branch naming + +``` +{type}/{issue_number}-{short-description} +``` + +Types: `feat`, `fix`, `docs`, `test`, `refactor`. + +Example: `feat/43-tool-timeout-guardrails` + +--- + +## Commit messages + +Conventional Commits format: + +``` +feat: add timeout guardrails to tool execution +fix: correct input mapping for literal constants +docs: update architecture map after log_utils rename +test: add edge case for empty input mapping +refactor: extract helper schemas to tests/helpers.py +``` + +--- + +## Examples + +All files in `examples/` must be runnable standalone: + +```bash +python examples/simple_linear_flow.py +``` + +No test-framework dependency. No imports from `tests/`. + +--- + +## Dependencies + +Pragmatic approach: adding well-known, well-maintained runtime dependencies +is acceptable when the use case warrants it. Always update +`pyproject.toml` `[project.dependencies]`. + +--- + +## Out-of-scope discoveries + +If you find a bug or stale content while working on a different task: +- **Small fix:** include it in the same PR. +- **Large fix:** open a separate issue. + +--- + +## New-module checklist + +When adding a new module to `chainweaver/`: + +1. Check the reserved-name list in [architecture.md § Planned modules](architecture.md#planned-modules). +2. Add `from __future__ import annotations` as the first code line. +3. Add type annotations to all function signatures. +4. Export public symbols in `chainweaver/__init__.py` `__all__`. +5. Add the module to the AGENTS.md repository map. +6. Add the module to the `architecture.md` module-boundaries table. +7. Create tests in `tests/test_{module}.py`. +8. Verify all four validation commands pass. +9. Update `pyproject.toml` if new dependencies are needed. +10. Update the common-tasks table in AGENTS.md if the module introduces + a new recurring task pattern. + +--- + +## Documentation governance triggers + +| Trigger | Required action | +|---------|-----------------| +| Add/remove/rename module | Update AGENTS.md repo map + architecture.md boundaries | +| Change coding conventions | Update workflows.md code style section | +| Change CI pipeline | Update workflows.md CI section | +| Add a new exception | Update AGENTS.md common tasks + README error table | +| Discover a recurring agent mistake | Record in lessons-learned.md | +| Change review expectations | Update review-checklist.md | +| Find a contradiction between docs | Fix in same PR if small; open issue if large | From dbedc4966a3d6d3e70cbf07e9baf341619cbcef5 Mon Sep 17 00:00:00 2001 From: dgenio Date: Mon, 9 Mar 2026 13:46:51 +0000 Subject: [PATCH 2/2] docs: fix vocabulary in planned modules table (chain flow) --- docs/agent-context/architecture.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/agent-context/architecture.md b/docs/agent-context/architecture.md index 156a81c..108ee39 100644 --- a/docs/agent-context/architecture.md +++ b/docs/agent-context/architecture.md @@ -76,9 +76,9 @@ files that conflict with these names: | Reserved name | Issue | Purpose | |---------------|-------|---------| -| `compiler.py` | #71 | Compile-time schema chain validation | -| `analyzer.py` | #77 | Offline chain analyzer | -| `observer.py` | #78 | Runtime chain observer | +| `compiler.py` | #71 | Compile-time schema flow validation | +| `analyzer.py` | #77 | Offline flow analyzer | +| `observer.py` | #78 | Runtime flow observer | | `compat.py` | #48 | Schema fingerprinting | | `viz.py` | #79 | Flow visualization | | `cli.py` | #44 | CLI interface |