From d3cf11054c84f097d366609f1d9ea1b70f8d8882 Mon Sep 17 00:00:00 2001
From: dgenio <diogo.ansantos@nos.pt>
Date: Mon, 9 Mar 2026 10:44:00 +0000
Subject: [PATCH 1/2] docs: rewrite agent documentation system

Replace monolithic AGENTS.md and copilot-instructions.md with a layered
documentation architecture.

Canonical shared layer (docs/agent-context/):
- architecture.md: module boundaries, design traps, planned modules
- workflows.md: dev commands, CI, code style, testing, PR conventions
- invariants.md: 12 hard rules, forbidden patterns, safe/unsafe table
- lessons-learned.md: 5 recurring mistake patterns, promotion criteria
- review-checklist.md: definition-of-done checklist by category

Projection layers:
- .github/copilot-instructions.md: thin review-oriented wrapper
- .github/instructions/chainweaver.instructions.md: scoped design traps
- .github/instructions/tests.instructions.md: scoped test conventions
- .claude/CLAUDE.md: Claude operational entrypoint and router

AGENTS.md rewritten as primary entrypoint with 12 invariants, vocabulary
table, executor/flow semantics, documentation map, and update policy.
---
 .claude/CLAUDE.md                             |  68 ++++++
 .github/copilot-instructions.md               | 140 ++++-------
 .../instructions/chainweaver.instructions.md  |  15 ++
 .github/instructions/tests.instructions.md    |  27 ++
 AGENTS.md                                     | 230 ++++++++++++------
 docs/agent-context/architecture.md            |  95 ++++++++
 docs/agent-context/invariants.md              |  80 ++++++
 docs/agent-context/lessons-learned.md         | 103 ++++++++
 docs/agent-context/review-checklist.md        |  92 +++++++
 docs/agent-context/workflows.md               | 162 ++++++++++++
 10 files changed, 837 insertions(+), 175 deletions(-)
 create mode 100644 .claude/CLAUDE.md
 create mode 100644 .github/instructions/chainweaver.instructions.md
 create mode 100644 .github/instructions/tests.instructions.md
 create mode 100644 docs/agent-context/architecture.md
 create mode 100644 docs/agent-context/invariants.md
 create mode 100644 docs/agent-context/lessons-learned.md
 create mode 100644 docs/agent-context/review-checklist.md
 create mode 100644 docs/agent-context/workflows.md

diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
new file mode 100644
index 0000000..b5401af
--- /dev/null
+++ b/.claude/CLAUDE.md
@@ -0,0 +1,68 @@
+# ChainWeaver — Claude Instructions
+
+Canonical source of truth: [AGENTS.md](/AGENTS.md) and
+[docs/agent-context/](/docs/agent-context/).
+
+Read AGENTS.md before starting any task. It contains the repo map,
+invariants, entry points, common tasks, validation commands, and
+documentation map that routes to deeper guidance.
+
+---
+
+## Explore before acting
+
+- Read the canonical docs for the topic area before writing code.
+- Inspect the files you plan to change. Do not assume structure from memory.
+- Check [architecture.md](/docs/agent-context/architecture.md) for design
+  traps and reserved module names before creating or renaming files.
+- Do not infer repo-wide rules from a single local example.
+
+## Implement safely
+
+- Preserve invariants. The three executor rules (no LLM, no network I/O,
+  no randomness in `executor.py`) are non-negotiable. See
+  [invariants.md](/docs/agent-context/invariants.md).
+- Use authoritative commands exactly as listed in
+  [AGENTS.md § Validation commands](/AGENTS.md#7-validation-commands).
+  Do not substitute alternative flags, paths, or invocations.
+- Follow the conventions in canonical docs. Do not invent new patterns.
+- Do not "clean up" or "simplify" code that looks unusual without first
+  checking [architecture.md § Design traps](/docs/agent-context/architecture.md#design-traps).
+
+## Validate before completing
+
+- Run all four validation commands and confirm they pass.
+- Check whether your change triggers a doc update. Consult the governance
+  triggers in [workflows.md](/docs/agent-context/workflows.md#documentation-governance-triggers).
+- Walk [review-checklist.md](/docs/agent-context/review-checklist.md) before
+  marking work done.
+- Verify that docstrings match actual behavior, not intended behavior.
+
+## Handle contradictions
+
+- If canonical docs contradict each other, flag the conflict explicitly.
+  Do not silently pick one side.
+- If code contradicts canonical docs, trust the docs for conventions and
+  the code for runtime behavior. Flag the gap.
+- If an older or duplicate document disagrees with AGENTS.md or
+  `docs/agent-context/`, prefer AGENTS.md.
+- Fix small contradictions in the same PR. Open an issue for large ones.
+
+## Capture lessons
+
+- If you discover a recurring failure pattern during work, note it as a
+  candidate lesson.
+- A candidate lesson is provisional. Do not promote it into durable docs
+  based on a single observation.
+- A lesson is promotable when it is reusable, decision-shaping, and durable
+  — not just a one-off incident.
+- Promotion order: canonical docs first (`lessons-learned.md`), then
+  projections. See the criteria in
+  [lessons-learned.md](/docs/agent-context/lessons-learned.md#promotion-criteria).
+
+## Update order
+
+1. Update canonical shared docs (`AGENTS.md`, `docs/agent-context/`) first.
+2. Update tool-specific projections (this file, `.github/copilot-instructions.md`) second.
+3. If a Claude-specific rule starts to look shared and durable, promote it
+   into canonical docs and simplify it here.
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index 67a5ccf..c451645 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -1,98 +1,46 @@
 # Copilot Instructions — ChainWeaver
 
-These instructions apply to all Copilot interactions in this repository.
-For full architecture context and decision rationale, see [AGENTS.md](/AGENTS.md).
-
-## Language & runtime
-
-- Python 3.10+ (target version for all code)
-- `from __future__ import annotations` at the top of every module
-- Type annotations on all function signatures (this is a `py.typed` package)
-
-## Code style
-
-- Formatter: `ruff format` (line length 99, double quotes, trailing commas)
-- Linter: `ruff check` with rule sets: E, W, F, I, UP, B, SIM, RUF
-- Import order: `isort`-compatible via Ruff's `I` rules (known first-party: `chainweaver`)
-- Naming: snake_case for functions/variables, PascalCase for classes
-- Docstrings: Google style (Args/Returns/Raises sections)
-
-## Architecture rules
-
-- All data models use `pydantic.BaseModel` (pydantic v2 API)
-- All exceptions inherit from `ChainWeaverError` (in `chainweaver/exceptions.py`)
-- All public symbols must be listed in `chainweaver/__init__.py` `__all__`
-- `executor.py` is deterministic — no LLM calls, no network I/O, no randomness
-- Tool functions: `fn(validated_input: BaseModel) -> dict[str, Any]`
-
-## Project layout
-
-```
-chainweaver/          → Package source (all modules use `from __future__ import annotations`)
-  __init__.py         → Public API surface; all exports listed in __all__
-  tools.py            → Tool class: named callable with Pydantic input/output schemas
-  flow.py             → FlowStep + Flow: ordered step definitions (Pydantic models)
-  registry.py         → FlowRegistry: in-memory catalogue of named flows
-  executor.py         → FlowExecutor: sequential, LLM-free runner (main entry point)
-  exceptions.py       → Typed exception hierarchy (all inherit ChainWeaverError)
-  log_utils.py        → Structured per-step logging utilities
-pyproject.toml        → Ruff, mypy, pytest config (source of truth for tool settings)
-tests/                → pytest test suite
-  conftest.py         → Shared fixtures (tools, flows, executors)
-  helpers.py          → Shared Pydantic schemas and tool functions
-examples/             → Runnable usage examples
-.github/workflows/    → CI (ci.yml) and publish (publish.yml) pipelines
-```
-
-## Testing
-
-- Framework: `pytest` (no unittest)
-- Test files: `tests/test_*.py`
-- Use `@pytest.fixture()` for shared objects (tools, flows, executors)
-- Shared schemas and helper functions live in `tests/helpers.py`
-- Test both success and error paths
-- Assertions: use plain `assert` (pytest rewrites them), not `self.assertEqual`
-- No mocking of internal ChainWeaver classes unless testing integration boundaries
-
-## Validation commands (run before every commit/PR)
-
-```bash
-# Install with dev dependencies
-pip install -e ".[dev]"
-
-# Lint
-ruff check chainweaver/ tests/ examples/
-
-# Check formatting
-ruff format --check chainweaver/ tests/ examples/
-
-# Type check
-python -m mypy chainweaver/
-
-# Run tests
-python -m pytest tests/ -v
-```
-
-Always run all four checks. CI runs lint + format + mypy on Python 3.10 only;
-tests run across Python 3.10, 3.11, 3.12, 3.13.
-
-## PR conventions
-
-- One logical change per PR
-- PR title: imperative mood (e.g., "Add retry logic to executor")
-- If you change architecture (add/remove/rename modules), update AGENTS.md and the project layout in this file in the same PR
-- If you change coding conventions, update this file in the same PR
-
-## Anti-patterns (never generate these)
-
-- Do NOT add LLM/AI client calls to `executor.py`
-- Do NOT use `unittest.TestCase` — use plain pytest functions/classes
-- Do NOT import from `chainweaver` internals using relative paths outside the package
-- Do NOT add dependencies without updating `pyproject.toml` `[project.dependencies]`
-- Do NOT commit secrets, API keys, or credentials
-
-## Trust these instructions
-
-These instructions are tested and aligned with CI. Only search for additional
-context if the information here is incomplete or found to be in error.
-For architecture decisions and rationale, see [AGENTS.md](/AGENTS.md).
+> Thin review-oriented layer. Canonical source of truth: [AGENTS.md](/AGENTS.md)
+> and [docs/agent-context/](/docs/agent-context/).
+
+---
+
+## Review-critical rules
+
+- Review code and agent-facing docs together. If a PR changes behavior,
+  invariants, architecture, or workflows, the corresponding docs must be
+  updated in the same PR.
+- Invariants take priority over cleanup, simplification, or local refactors.
+  See [AGENTS.md § Core invariants](/AGENTS.md#4-core-invariants).
+- Do not invent conventions. All coding style, naming, workflow, and testing
+  rules are grounded in [AGENTS.md](/AGENTS.md) and
+  [docs/agent-context/](/docs/agent-context/). If guidance is missing, surface
+  the gap — do not guess.
+- Use authoritative commands exactly as written in
+  [AGENTS.md § Validation commands](/AGENTS.md#7-validation-commands). Do not
+  substitute alternative flags, paths, or invocations.
+- If you find a contradiction or stale content in any doc, flag it explicitly.
+  Do not silently work around it.
+
+## Executor guardrails
+
+`executor.py` has three hard invariants — no LLM calls, no network I/O, no
+randomness. These are non-negotiable. See
+[invariants.md](/docs/agent-context/invariants.md#hard-executor-invariants).
+
+## Vocabulary
+
+| Use | Never use |
+|-----|-----------|
+| **flow** | chain, pipeline |
+| **tool** | function, action (when referring to a `Tool` instance) |
+
+## Where to find guidance
+
+| Topic | Canonical file |
+|-------|----------------|
+| Architecture, boundaries, design traps | [architecture.md](/docs/agent-context/architecture.md) |
+| Commands, CI, code style, testing, PR rules | [workflows.md](/docs/agent-context/workflows.md) |
+| Hard rules, forbidden patterns | [invariants.md](/docs/agent-context/invariants.md) |
+| Recurring mistake patterns | [lessons-learned.md](/docs/agent-context/lessons-learned.md) |
+| Definition-of-done, review gates | [review-checklist.md](/docs/agent-context/review-checklist.md) |
diff --git a/.github/instructions/chainweaver.instructions.md b/.github/instructions/chainweaver.instructions.md
new file mode 100644
index 0000000..202b5c3
--- /dev/null
+++ b/.github/instructions/chainweaver.instructions.md
@@ -0,0 +1,15 @@
+---
+applyTo: "chainweaver/**"
+---
+# ChainWeaver package — design traps
+
+Do not "fix" these without a solution for the underlying constraint.
+See [architecture.md § Design traps](/docs/agent-context/architecture.md#design-traps)
+for full context.
+
+- `StepRecord` and `ExecutionResult` are `dataclass`, not Pydantic. They carry
+  `Exception` instances. Do not convert them.
+- `log_utils.py` was renamed from `logging.py` to avoid stdlib shadowing. Do
+  not rename it back.
+- Weaver Stack: do not add agent-kernel or weaver-spec imports to `executor.py`.
+  `KernelBackedExecutor` goes in a separate class.
diff --git a/.github/instructions/tests.instructions.md b/.github/instructions/tests.instructions.md
new file mode 100644
index 0000000..0809e2f
--- /dev/null
+++ b/.github/instructions/tests.instructions.md
@@ -0,0 +1,27 @@
+---
+applyTo: "tests/**"
+---
+# Tests
+
+## File boundary
+
+- `tests/helpers.py` — shared Pydantic schemas and tool functions.
+- `tests/conftest.py` — pytest fixtures that compose objects from `helpers.py`.
+
+Do not merge these files. Do not put schemas in `conftest.py` or fixtures in
+`helpers.py`.
+
+## Framework rules
+
+- pytest only. No `unittest.TestCase`, no `self.assertEqual`.
+- Plain `assert` statements (pytest rewrites them).
+- No mocking of internal ChainWeaver classes unless testing integration boundaries.
+- Test both success and failure/error paths.
+
+## Organization
+
+- Unit tests grouped by module (`test_{module}.py`).
+- Integration tests grouped by scenario.
+- Test classes grouped by scenario (e.g., `TestSuccessfulExecution`).
+
+See [workflows.md § Testing conventions](/docs/agent-context/workflows.md#testing-conventions).
diff --git a/AGENTS.md b/AGENTS.md
index 6260704..a7e4f21 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,126 +1,198 @@
 # ChainWeaver — Agent Instructions
 
-> Recognized by GitHub Copilot (nearest-in-tree) and Claude Code as the
-> authoritative project guidance file.
+> Single source of truth for all coding agents working on this repository.
+> For tool-specific wrappers, see the documentation map at the end of this file.
 
 ---
 
-## 1. Project overview
+## 1. Project identity
 
-ChainWeaver is a deterministic orchestration layer for MCP-based agents. It
-compiles multi-tool chains into executable flows that run without any LLM
-involvement between steps. Python 3.10+, single runtime dependency
-(`pydantic>=2.0`).
+ChainWeaver is a deterministic orchestration layer for MCP-based agents.
+It compiles multi-tool flows into executable sequences that run without any
+LLM involvement between steps.
+
+- Python 3.10+; `from __future__ import annotations` in every module.
+- Single runtime dependency: `pydantic>=2.0`.
+- Core philosophy: **compiled, not interpreted** — the executor is a graph
+  runner, not a reasoning engine.
 
 ---
 
-## 2. Architecture map
+## 2. Domain vocabulary
 
-```
+Use these terms consistently in code, docs, comments, and PR descriptions.
+
+| Canonical term | Never use | Meaning |
+|----------------|-----------|---------|
+| **flow** | chain, pipeline | A named, ordered sequence of tool invocations (`Flow`) |
+| **tool** | function, action | A named callable with Pydantic input/output schemas (`Tool`) |
+
+---
+
+## 3. Repository map
+
+```text
 chainweaver/
-├── __init__.py       → Public API surface; all exports listed in __all__
-├── tools.py          → Tool class: named callable with Pydantic input/output schemas
-├── flow.py           → FlowStep + Flow: ordered step definitions (Pydantic models)
-├── registry.py       → FlowRegistry: in-memory catalogue of named flows
-├── executor.py       → FlowExecutor: sequential, LLM-free runner (main entry point)
-├── exceptions.py     → Typed exception hierarchy (all inherit ChainWeaverError)
-├── log_utils.py      → Structured per-step logging utilities
-└── py.typed          → PEP 561 marker for typed package
+├── __init__.py        Public API surface; all exports in __all__
+├── tools.py           Tool class: named callable with Pydantic I/O schemas
+├── flow.py            FlowStep + Flow: ordered step definitions (Pydantic models)
+├── registry.py        FlowRegistry: in-memory catalogue of named flows
+├── executor.py        FlowExecutor: sequential, LLM-free runner (main entry point)
+├── exceptions.py      Typed exception hierarchy (all inherit ChainWeaverError)
+├── log_utils.py       Structured per-step logging utilities
+└── py.typed           PEP 561 marker
+tests/
+├── conftest.py        Pytest fixtures (import schemas/functions from helpers.py)
+├── helpers.py         Shared Pydantic schemas and tool functions
+├── test_*.py          Test files
+examples/
+└── simple_linear_flow.py   Runnable standalone usage example
+pyproject.toml             Ruff, mypy, pytest config (source of truth for tooling)
+.github/workflows/         CI (ci.yml) and publish (publish.yml) pipelines
 ```
 
+### Key entry points
+
+- `FlowExecutor.execute_flow(flow_name, initial_input)` → `ExecutionResult`
+- `FlowRegistry.register_flow(flow, *, overwrite=False)` → register a flow
+- `FlowExecutor.register_tool(tool)` → register a tool for use in flows
+
 ---
 
-## 3. Key entry points
+## 4. Core invariants
 
-- `FlowExecutor.execute_flow(flow_name, initial_input)` — main orchestration
-  entry point; returns `ExecutionResult`
-- `FlowRegistry.register_flow(flow)` — register a flow for execution
-- `FlowExecutor.register_tool(tool)` — register a tool for use in flows
+Three hard executor invariants and nine package-wide invariants govern all
+changes. The executor is deterministic by design.
 
----
+**Executor — never add to `executor.py`:**
+1. No LLM or AI client calls.
+2. No network I/O.
+3. No randomness.
 
-## 4. Decision context
+**Package-wide:**
+4. All exceptions inherit from `ChainWeaverError` with relevant context
+   attributes (`tool_name`, `step_index`, `detail` where applicable).
+5. All public symbols exported in `chainweaver/__init__.py` `__all__`.
+6. Tool function signature: `fn(validated_input: BaseModel) -> dict[str, Any]`.
+7. `from __future__ import annotations` at the top of every module.
+8. Type annotations on all function signatures (package ships `py.typed`).
+9. Pydantic `BaseModel` for all data schemas (`Flow`, `FlowStep`, I/O contracts).
+10. No secrets, credentials, or PII in code, logs, or tests.
+11. All new code must pass: `ruff check`, `ruff format --check`, `mypy`, `pytest`.
+12. One logical change per PR; all tests must pass before merge.
 
-| Decision | Rationale |
-|---|---|
-| **Sequential-only execution** | Phase 1 MVP. DAG execution is planned for v0.2 (see Roadmap in README). |
-| **Pydantic for all schemas** | Schema validation ensures deterministic I/O contracts between steps. Every tool input/output is validated. |
-| **No LLM calls in executor** | Core design principle — "compiled, not interpreted." The executor is a graph runner, not a reasoning engine. |
-| **`from __future__ import annotations`** | Every module uses it for forward-reference support and cleaner type hints. |
+For the full prohibited-actions list and anti-patterns, see
+[invariants.md](docs/agent-context/invariants.md).
 
 ---
 
-## 5. Top invariants
+## 5. Executor and flow semantics
+
+### `FlowStep.input_mapping`
+
+| Value type | Behavior |
+|------------|----------|
+| `str` | Looked up as a key in the accumulated execution context. |
+| Non-string (`int`, `float`, `bool`, …) | Used as a literal constant. |
+| Empty `{}` (default) | The tool receives the full current context. |
+
+### `ExecutionResult` (dataclass)
+
+| Field | Type | Meaning |
+|-------|------|---------|
+| `flow_name` | `str` | Name of the executed flow. |
+| `success` | `bool` | `True` when all steps completed without error. |
+| `final_output` | `dict \| None` | Merged execution context, or `None` on failure. |
+| `execution_log` | `list[StepRecord]` | Ordered per-step records. |
+
+### `StepRecord` (dataclass)
 
-1. **No LLM calls in `executor.py`** — the executor is deterministic by design.
-2. All exceptions inherit from `ChainWeaverError` and carry relevant context attributes (e.g. `tool_name`, `step_index`, `detail` where applicable).
-3. All public API symbols must be exported in `chainweaver/__init__.py` `__all__`.
-4. Every tool function signature: `fn(validated_input: BaseModel) -> dict[str, Any]`.
-5. `from __future__ import annotations` at the top of every module.
-6. Type annotations on all function signatures (the package ships `py.typed`).
-7. Pydantic `BaseModel` for all data schemas (`Flow`, `FlowStep`, input/output contracts).
-8. No secrets, credentials, or PII in code, logs, or tests.
-9. All new code must pass: `pytest`, `ruff check`, `ruff format --check`.
-10. One logical change per PR; all tests must pass before merge.
+| Field | Type | Meaning |
+|-------|------|---------|
+| `step_index` | `int` | Zero-based position (`-1` = flow-input validation, `len(steps)` = flow-output validation). |
+| `tool_name` | `str` | Tool invoked (or flow name for validation records). |
+| `inputs` | `dict` | Validated inputs passed to the tool. |
+| `outputs` | `dict \| None` | Validated outputs, or `None` on failure. |
+| `error` | `Exception \| None` | Exception raised, or `None` on success. |
+| `success` | `bool` | `True` when the step completed without error. |
+
+> **Design note:** `StepRecord` and `ExecutionResult` are intentionally
+> `dataclass`, not `BaseModel`. They carry `Exception` instances that Pydantic
+> cannot serialize. See [architecture.md § Design traps](docs/agent-context/architecture.md#design-traps).
 
 ---
 
-## 6. Development workflow
+## 6. Common tasks
 
-```bash
-# Install with dev dependencies
-pip install -e ".[dev]"
+| Task | Where to look | What to update |
+|------|---------------|----------------|
+| Add a new tool | `tools.py` | Integration tests in `test_flow_execution.py` |
+| Add a new exception | `exceptions.py` | `__init__.py` + `__all__` + README error table — **same PR** |
+| Modify flow execution | `executor.py` | Keep `StepRecord` + `ExecutionResult` consistent |
+| Add a new Flow field | `flow.py` | Serialization tests if `model_dump()` changes |
+| Change logging format | `log_utils.py` | Update tests (no re-export needed) |
+| Add a new module | See [new-module checklist](docs/agent-context/workflows.md#new-module-checklist) |
 
-# Run tests
-pytest
+### Exception message style
 
-# Lint
-ruff check chainweaver/ tests/ examples/
+Use f-string sentences with single-quoted identifiers, ending with a period:
 
-# Check formatting
-ruff format --check chainweaver/ tests/ examples/
+```python
+f"Tool '{tool_name}' is not registered."
+```
+
+---
 
-# Run the example
-python examples/simple_linear_flow.py
+## 7. Validation commands
+
+Run all four before every commit and PR:
+
+```bash
+ruff check chainweaver/ tests/ examples/
+ruff format --check chainweaver/ tests/ examples/
+python -m mypy chainweaver/
+python -m pytest tests/ -v
 ```
 
-See [README > Development](README.md#development) for extended context.
+CI runs lint + format + mypy on Python 3.10 only; tests run across 3.10–3.13.
 
-> **Note:** Once [#39](https://github.com/dgenio/ChainWeaver/issues/39) (mypy)
-> lands, add `python -m mypy chainweaver/` to the validation sequence.
+For full CI, PR, branch, and commit conventions, see
+[workflows.md](docs/agent-context/workflows.md).
 
 ---
 
-## 7. Common tasks
+## 8. Definition of done
 
-| Task | Where to look | What to update |
-|---|---|---|
-| Add a new Tool | `chainweaver/tools.py` | Integration tests in `tests/test_flow_execution.py` |
-| Add a new exception | `chainweaver/exceptions.py` | Re-export in `chainweaver/__init__.py`, update `__all__` |
-| Modify flow execution | `chainweaver/executor.py` | Ensure `StepRecord` and `ExecutionResult` stay consistent |
-| Add a new Flow field | `chainweaver/flow.py` | Update serialization tests if `model_dump()` changes |
-| Change logging format | `chainweaver/log_utils.py` | No re-export needed; update tests |
+Before marking a PR ready for review:
+
+- [ ] All four validation commands pass locally.
+- [ ] Both success and error paths are tested.
+- [ ] `__init__.py` `__all__` is updated if public symbols were added.
+- [ ] No new contradictions introduced between docs.
+- [ ] AGENTS.md updated if architecture changed.
 
-> **Ownership rule:** If you change the architecture, update this file in the
-> same PR.
+Full checklist: [review-checklist.md](docs/agent-context/review-checklist.md).
 
 ---
 
-## 8. Testing conventions
+## 9. Documentation map
 
-- Test files: `tests/test_*.py`
-- Test classes grouped by scenario (e.g., `TestSuccessfulExecution`, `TestMissingTool`)
-- Use `@pytest.fixture()` for shared objects (tools, flows, executors)
-- Shared fixtures and schemas live in `tests/conftest.py`
-- Test both success and failure paths
-- See [README > Development](README.md#development) for commands
+| File | Purpose | Consult when… |
+|------|---------|---------------|
+| [architecture.md](docs/agent-context/architecture.md) | Boundaries, decisions, design traps, planned modules | Scoping changes, understanding why something is built a certain way, choosing file placement |
+| [workflows.md](docs/agent-context/workflows.md) | Commands, CI, code style, testing, PR/git conventions | Writing code, creating branches/PRs, adding modules, running CI |
+| [invariants.md](docs/agent-context/invariants.md) | Hard rules, forbidden patterns | Modifying core modules, adding deps, touching executor |
+| [lessons-learned.md](docs/agent-context/lessons-learned.md) | Recurring mistake patterns | Before proposing changes to avoid known pitfalls |
+| [review-checklist.md](docs/agent-context/review-checklist.md) | Definition-of-done, review gates | Before submitting a PR, during code review |
 
 ---
 
-## 9. CI pipeline
+## 10. Update policy
 
-- `.github/workflows/ci.yml`: runs on push/PR to `main`
-  - Ruff lint + format check (Python 3.10 only)
-  - `pytest` across Python 3.10, 3.11, 3.12, 3.13
-- `.github/workflows/publish.yml`: triggered by `v*` tags →
-  test → build → PyPI publish → GitHub Release
+- **Every PR:** check whether AGENTS.md or any `docs/agent-context/` file is
+  stale with respect to the change. Update in the same PR if so.
+- **Architecture changes** (add/remove/rename modules): update AGENTS.md repo
+  map and architecture.md in the same PR.
+- **Ownership rule:** if you change the architecture, you own the doc update.
+- **Contradictions:** if you find a contradiction between docs, fix it in the
+  same PR if small, or open an issue if large.
diff --git a/docs/agent-context/architecture.md b/docs/agent-context/architecture.md
new file mode 100644
index 0000000..156a81c
--- /dev/null
+++ b/docs/agent-context/architecture.md
@@ -0,0 +1,95 @@
+# Architecture
+
+> Canonical reference for ChainWeaver's architectural intent, major boundaries,
+> and design decisions. Consult this before scoping changes or choosing where
+> new code belongs.
+
+---
+
+## Architectural intent
+
+ChainWeaver is a **deterministic graph runner**. It compiles ordered sequences
+of tool invocations into flows and executes them with strict schema validation
+at every boundary. No LLM, no network I/O, no randomness enters the executor.
+
+The entire value proposition rests on this determinism: given the same input
+and tools, the same flow produces the same output every time.
+
+---
+
+## Module boundaries
+
+| Module | Responsibility | Key constraint |
+|--------|---------------|----------------|
+| `tools.py` | Define `Tool`: name + callable + Pydantic I/O schemas | Tool functions must be `fn(BaseModel) -> dict[str, Any]` |
+| `flow.py` | Define `FlowStep` and `Flow` as Pydantic models | Pure data definitions; no execution logic |
+| `registry.py` | Store and retrieve flows by name | In-memory; intentionally simple for later wrapping |
+| `executor.py` | Run flows step-by-step, validate I/O, merge context | **No LLM, no network I/O, no randomness** |
+| `exceptions.py` | Typed exception hierarchy | All inherit `ChainWeaverError`; carry context attrs |
+| `log_utils.py` | Per-step structured logging | Library-safe (NullHandler only); no handler config |
+| `__init__.py` | Public API surface | Every public symbol must be in `__all__` |
+
+---
+
+## Decision context
+
+| Decision | Rationale |
+|----------|-----------|
+| Sequential-only execution | Phase 1 MVP. DAG execution is planned for v0.2. |
+| Pydantic for all schemas | Deterministic I/O contracts between steps. |
+| No LLM calls in executor | "Compiled, not interpreted." |
+| `from __future__ import annotations` | Forward-reference support; cleaner type hints. |
+| `dataclass` for `StepRecord`/`ExecutionResult` | They carry `Exception` instances; Pydantic cannot serialize these. |
+
+---
+
+## Design traps
+
+Things that look wrong but are intentional. Do not "fix" these without a
+solution for the underlying constraint.
+
+### `StepRecord` and `ExecutionResult` are dataclasses, not Pydantic
+
+The `error` field holds an `Exception` instance. Pydantic's serialization
+cannot handle arbitrary exception objects. These may migrate to Pydantic if a
+serialization solution is found, but until then agents must not convert them.
+
+### `log_utils.py`, not `logging.py`
+
+Renamed from `logging.py` (commit ccfe7f8) to avoid shadowing Python's `logging`
+stdlib module. Do not rename it back.
+
+### `tests/helpers.py` is separate from `tests/conftest.py`
+
+Extracted intentionally (commit 7ef3245). Boundary:
+- `helpers.py` → shared Pydantic schemas and tool functions (importable by any test)
+- `conftest.py` → pytest fixtures that compose objects from `helpers.py`
+
+Do not merge them back together.
+
+---
+
+## Planned modules
+
+The following module names are reserved for planned features. Do not create
+files that conflict with these names:
+
+| Reserved name | Issue | Purpose |
+|---------------|-------|---------|
+| `compiler.py` | #71 | Compile-time schema chain validation |
+| `analyzer.py` | #77 | Offline chain analyzer |
+| `observer.py` | #78 | Runtime chain observer |
+| `compat.py` | #48 | Schema fingerprinting |
+| `viz.py` | #79 | Flow visualization |
+| `cli.py` | #44 | CLI interface |
+| `mcp/` | #70, #72 | MCP adapter + flow server |
+| `integrations/` | #82 | LangChain/LlamaIndex bridge adapters |
+| `export/` | #25 | Flow export formats |
+| `governance.py` | #13 | Governance policies |
+
+### Weaver Stack guardrail
+
+Issues #89–#91 introduce a kernel-backed executor (`KernelBackedExecutor`) that
+delegates step execution to an agent-kernel. This is a **separate class** —
+do not add agent-kernel or weaver-spec imports to `executor.py`. The core
+`FlowExecutor` stays deterministic and standalone.
diff --git a/docs/agent-context/invariants.md b/docs/agent-context/invariants.md
new file mode 100644
index 0000000..711f440
--- /dev/null
+++ b/docs/agent-context/invariants.md
@@ -0,0 +1,80 @@
+# Invariants
+
+> The strongest "do not break these assumptions" reference. Consult this when
+> modifying core modules, adding dependencies, or touching the executor.
+
+---
+
+## Hard executor invariants
+
+These three rules are foundational to ChainWeaver's value proposition.
+They are non-negotiable.
+
+| # | Rule | Why |
+|---|------|-----|
+| 1 | **No LLM or AI client calls** in `executor.py` | The executor is deterministic. Same input + same tools = same output. |
+| 2 | **No network I/O** in `executor.py` | Network I/O belongs in tool functions, not the orchestrator. |
+| 3 | **No randomness** in `executor.py` | Random routing or jitter would break the "compiled, not interpreted" guarantee. |
+
+Network I/O and randomness are allowed in **tool functions** — the executor
+only manages the data flow between tools.
+
+---
+
+## Package-wide invariants
+
+| # | Rule |
+|---|------|
+| 4 | All exceptions inherit from `ChainWeaverError` with context attrs. |
+| 5 | All public symbols in `__init__.py` `__all__`. |
+| 6 | Tool signature: `fn(validated_input: BaseModel) -> dict[str, Any]`. |
+| 7 | `from __future__ import annotations` in every module. |
+| 8 | Type annotations on all function signatures (`py.typed` package). |
+| 9 | Pydantic `BaseModel` for all data schemas. |
+| 10 | No secrets, credentials, or PII in code, logs, or tests. |
+| 11 | All new code must pass: `ruff check`, `ruff format --check`, `mypy`, `pytest`. |
+| 12 | One logical change per PR; all tests must pass before merge. |
+
+---
+
+## Forbidden patterns
+
+Never generate these in ChainWeaver code:
+
+| Pattern | Why |
+|---------|-----|
+| LLM/AI client calls in `executor.py` | Violates invariant 1 |
+| `unittest.TestCase` | Use plain pytest functions/classes |
+| Relative imports from `chainweaver` internals outside the package | Breaks package boundaries |
+| Adding deps without updating `pyproject.toml` `[project.dependencies]` | Invisible dependency |
+| Secrets, API keys, or credentials in code | Security invariant |
+| Converting `StepRecord`/`ExecutionResult` to Pydantic `BaseModel` | They carry `Exception`; see [architecture.md § Design traps](architecture.md#design-traps) |
+| Renaming `log_utils.py` back to `logging.py` | Stdlib shadowing; see [architecture.md § Design traps](architecture.md#design-traps) |
+| Merging `tests/helpers.py` into `conftest.py` | Intentional split; see [architecture.md § Design traps](architecture.md#design-traps) |
+| Adding agent-kernel or weaver-spec imports to `executor.py` | Weaver Stack goes in `KernelBackedExecutor`; see [architecture.md § Weaver Stack](architecture.md#weaver-stack-guardrail) |
+| Adding deps to `executor.py` that conflict with kernel delegation | Future `KernelBackedExecutor` requires a clean executor |
+
+---
+
+## Safe vs. unsafe simplifications
+
+| Change | Safe? | Notes |
+|--------|-------|-------|
+| Extract a helper function within a module | ✅ Yes | Keep it private (`_name`) unless it's a public API |
+| Refactor tests to use shared fixtures | ✅ Yes | Put new schemas in `helpers.py`, fixtures in `conftest.py` |
+| Remove an unused import | ✅ Yes | Ruff already flags these |
+| Inline a private helper | ✅ Yes | If it reduces complexity |
+| Convert `StepRecord`/`ExecutionResult` to Pydantic | ❌ No | See forbidden patterns |
+| Add a new field to `Flow` or `FlowStep` | ⚠️ Careful | Check `model_dump()` serialization; update tests |
+| Change exception hierarchy | ⚠️ Careful | May break downstream `except` clauses |
+| Add network I/O to executor.py | ❌ No | Hard invariant |
+
+---
+
+## Update triggers
+
+Update this file when:
+- A new hard invariant is established.
+- A new forbidden pattern is discovered.
+- An invariant is relaxed or removed (document why).
+- A new "safe vs. unsafe" category is identified.
diff --git a/docs/agent-context/lessons-learned.md b/docs/agent-context/lessons-learned.md
new file mode 100644
index 0000000..cd79ee6
--- /dev/null
+++ b/docs/agent-context/lessons-learned.md
@@ -0,0 +1,103 @@
+# Lessons Learned
+
+> Reusable patterns from past mistakes. Not an incident archive — only
+> generalized, durable lessons belong here.
+
+---
+
+## Failure-capture workflow
+
+When a PR review or CI failure reveals a recurring mistake pattern:
+
+1. **Identify the generalized lesson.** Strip project-specific details.
+   Ask: "Would a different agent make the same mistake on a different task?"
+2. **Check whether a lesson already exists** in this file. If so, refine it
+   rather than duplicating.
+3. **Write a new entry** if the pattern is genuinely new. Use the format below.
+4. **Consider promoting to invariants.md** if the lesson represents a rule that
+   should never be violated (rather than a common mistake to watch for).
+
+### What belongs here
+
+- Recurring mistakes agents make, generalized into actionable guidance.
+- Patterns observed across multiple PRs or multiple agents.
+
+### What does NOT belong here
+
+- One-off bugs or typos.
+- Incident narratives or timelines.
+- Guidance already captured as an invariant or forbidden pattern in
+  [invariants.md](invariants.md).
+
+---
+
+## Recurring mistake patterns
+
+### 1. Docstrings that don't match actual behavior
+
+**Pattern:** Agent writes or updates a docstring that describes intended
+behavior rather than actual behavior (e.g., claiming a field is immutable
+when the dataclass isn't frozen, or documenting exceptions as raised when
+they are actually caught and returned via `ExecutionResult`).
+
+**Prevention:** After writing or modifying a docstring, verify each claim
+against the implementation. Check: return types, raised vs. caught exceptions,
+mutability, field semantics.
+
+---
+
+### 2. Referencing files or configs that don't exist
+
+**Pattern:** Agent mentions a file, config key, or test module in docs or
+code that doesn't exist in the repository (e.g., `tests/test_tools.py`,
+an isort config before it was added).
+
+**Prevention:** Before referencing any file or config in prose, verify it
+exists. Use the repository map in AGENTS.md or check the file system directly.
+
+---
+
+### 3. Commands that don't match CI exactly
+
+**Pattern:** Agent includes shell commands in docs or scripts that differ
+from CI (e.g., `ruff check .` instead of `ruff check chainweaver/ tests/ examples/`,
+omitting `python -m` prefix, using different flags).
+
+**Prevention:** Copy commands from the authoritative sequence in
+[workflows.md § Validation commands](workflows.md#validation-commands).
+Never improvise command variations.
+
+---
+
+### 4. Markdown formatting errors in agent-generated docs
+
+**Pattern:** Agent produces invalid Markdown syntax — `|>` instead of `>`
+for blockquotes, `||` creating phantom table columns, broken link syntax.
+
+**Prevention:** Review generated Markdown for syntax correctness before
+committing. Validate tables have consistent column counts.
+
+---
+
+### 5. Overclaiming capabilities or properties
+
+**Pattern:** Agent asserts a property that is aspirational rather than actual
+(e.g., claiming immutability without `frozen=True`, claiming all exceptions
+carry `step_index` when some only carry `name`).
+
+**Prevention:** Verify assertions against the code. If a property isn't
+enforced at the language level, don't claim it.
+
+---
+
+## Promotion criteria
+
+A lesson should be **promoted to invariants.md** when:
+- It represents a hard rule, not just a common mistake.
+- Violating it would cause CI failure, runtime error, or architectural damage.
+- It applies unconditionally, not just "in most cases."
+
+A lesson should be **removed** when:
+- The underlying cause has been eliminated (e.g., a tool fix makes the
+  mistake impossible).
+- It has been superseded by a more specific or more general lesson.
diff --git a/docs/agent-context/review-checklist.md b/docs/agent-context/review-checklist.md
new file mode 100644
index 0000000..0bf3987
--- /dev/null
+++ b/docs/agent-context/review-checklist.md
@@ -0,0 +1,92 @@
+# Review Checklist
+
+> Definition-of-done checks for agent self-review and maintainer review.
+> Use this before marking a PR ready.
+
+---
+
+## CI and validation
+
+- [ ] `ruff check chainweaver/ tests/ examples/` passes.
+- [ ] `ruff format --check chainweaver/ tests/ examples/` passes.
+- [ ] `python -m mypy chainweaver/` passes.
+- [ ] `python -m pytest tests/ -v` passes.
+- [ ] Commands match the authoritative sequence exactly (see [workflows.md](workflows.md#validation-commands)).
+
+---
+
+## Code correctness
+
+- [ ] New code has type annotations on all function signatures.
+- [ ] New modules start with `from __future__ import annotations`.
+- [ ] Tool functions follow the signature: `fn(validated_input: BaseModel) -> dict[str, Any]`.
+- [ ] Exception messages use f-string style with single-quoted identifiers, ending with a period.
+- [ ] No `unittest.TestCase` — plain pytest functions/classes only.
+- [ ] No relative imports from `chainweaver` internals outside the package.
+
+---
+
+## Testing
+
+- [ ] Both success and error/failure paths are tested.
+- [ ] New schemas added to `tests/helpers.py` (not `conftest.py`).
+- [ ] New fixtures added to `tests/conftest.py` (not `helpers.py`).
+- [ ] Assertions use plain `assert`, not `self.assertEqual`.
+- [ ] No mocking of internal ChainWeaver classes (unless at integration boundary).
+
+---
+
+## Public API
+
+- [ ] New public symbols added to `chainweaver/__init__.py` `__all__`.
+- [ ] New exceptions: `__init__.py` + `__all__` + README error table — all updated.
+- [ ] `StepRecord` / `ExecutionResult` remain as dataclasses (not converted to Pydantic).
+
+---
+
+## Architecture
+
+- [ ] No LLM calls, network I/O, or randomness added to `executor.py`.
+- [ ] No new dependencies added without updating `pyproject.toml`.
+- [ ] New module name does not conflict with [reserved names](architecture.md#planned-modules).
+- [ ] No agent-kernel or weaver-spec imports in `executor.py`.
+
+---
+
+## Documentation consistency
+
+- [ ] AGENTS.md repo map updated if modules were added/removed/renamed.
+- [ ] `architecture.md` module boundaries updated if architecture changed.
+- [ ] `workflows.md` updated if commands, CI, or conventions changed.
+- [ ] README error table updated if exceptions were added.
+- [ ] No docstrings that claim behavior the code doesn't implement.
+  (See [lessons-learned.md § pattern 1](lessons-learned.md#1-docstrings-that-dont-match-actual-behavior).)
+- [ ] No references to files or configs that don't exist.
+  (See [lessons-learned.md § pattern 2](lessons-learned.md#2-referencing-files-or-configs-that-dont-exist).)
+
+---
+
+## Domain vocabulary
+
+- [ ] Uses "flow" (never "chain" or "pipeline").
+- [ ] Uses "tool" (never "function" or "action" for Tool instances).
+
+---
+
+## PR hygiene
+
+- [ ] One logical change per PR.
+- [ ] PR title in imperative mood.
+- [ ] Branch follows `{type}/{issue_number}-{short-description}` convention.
+- [ ] Commits follow Conventional Commits format.
+- [ ] No secrets, API keys, or credentials.
+
+---
+
+## Update triggers
+
+Update this checklist when:
+- New review gates are established (e.g., new invariants).
+- Existing checks are found to be insufficient or redundant.
+- New recurring mistakes are added to `lessons-learned.md` that warrant
+  a corresponding checklist item.
diff --git a/docs/agent-context/workflows.md b/docs/agent-context/workflows.md
new file mode 100644
index 0000000..6c71ab9
--- /dev/null
+++ b/docs/agent-context/workflows.md
@@ -0,0 +1,162 @@
+# Workflows
+
+> Canonical reference for development commands, CI, code style, testing
+> conventions, PR/git rules, and documentation governance triggers.
+
+---
+
+## Validation commands
+
+Run all four before every commit and PR. This is the authoritative sequence:
+
+```bash
+# 1. Lint
+ruff check chainweaver/ tests/ examples/
+
+# 2. Format check
+ruff format --check chainweaver/ tests/ examples/
+
+# 3. Type check
+python -m mypy chainweaver/
+
+# 4. Tests
+python -m pytest tests/ -v
+```
+
+**Command-selection rules:**
+- Always scope to `chainweaver/ tests/ examples/` — never use bare `.` or `src/`.
+- Always use `python -m pytest`, not bare `pytest`, for consistent module resolution.
+- Always use `python -m mypy`, not bare `mypy`.
+
+---
+
+## CI pipeline
+
+| Workflow | Trigger | Steps |
+|----------|---------|-------|
+| `ci.yml` | Push/PR to `main` | Ruff lint + format + mypy (Python 3.10 only); pytest across 3.10, 3.11, 3.12, 3.13 |
+| `publish.yml` | `v*` tags | Test → build → PyPI publish → GitHub Release |
+
+---
+
+## Code style
+
+- **Formatter:** `ruff format` — line length 99, double quotes, trailing commas.
+- **Linter:** `ruff check` — rule sets: E, W, F, I, UP, B, SIM, RUF.
+- **Import order:** isort-compatible via Ruff's `I` rules (known first-party: `chainweaver`).
+- **Naming:** `snake_case` for functions/variables, `PascalCase` for classes.
+- **Docstrings:** Google style (Args/Returns/Raises sections).
+- **Exception messages:** f-string sentences, single-quoted identifiers, end with a period.
+  ```python
+  f"Tool '{tool_name}' is not registered."
+  ```
+
+---
+
+## Testing conventions
+
+- **Framework:** pytest only. No `unittest.TestCase`.
+- **Test files:** `tests/test_*.py`.
+- **Shared artifacts boundary:**
+  - `tests/helpers.py` — Pydantic schemas and tool functions.
+  - `tests/conftest.py` — pytest fixtures that compose objects from `helpers.py`.
+- **Organization:** hybrid — unit tests grouped by module, integration tests grouped by scenario.
+- **Test classes:** grouped by scenario (e.g., `TestSuccessfulExecution`, `TestMissingTool`).
+- **Assertions:** plain `assert` (pytest rewrites them). Not `self.assertEqual`.
+- **Mocking:** no mocking of internal ChainWeaver classes unless testing integration boundaries.
+- **Coverage:** test both success and failure/error paths.
+
+---
+
+## PR conventions
+
+- One logical change per PR.
+- PR title: imperative mood (e.g., "Add retry logic to executor").
+- Architecture changes → update AGENTS.md repo map + `architecture.md` in the same PR.
+- Coding convention changes → update this file in the same PR.
+
+---
+
+## Branch naming
+
+```
+{type}/{issue_number}-{short-description}
+```
+
+Types: `feat`, `fix`, `docs`, `test`, `refactor`.
+
+Example: `feat/43-tool-timeout-guardrails`
+
+---
+
+## Commit messages
+
+Conventional Commits format:
+
+```
+feat: add timeout guardrails to tool execution
+fix: correct input mapping for literal constants
+docs: update architecture map after log_utils rename
+test: add edge case for empty input mapping
+refactor: extract helper schemas to tests/helpers.py
+```
+
+---
+
+## Examples
+
+All files in `examples/` must be runnable standalone:
+
+```bash
+python examples/simple_linear_flow.py
+```
+
+No test-framework dependency. No imports from `tests/`.
+
+---
+
+## Dependencies
+
+Pragmatic approach: adding well-known, well-maintained runtime dependencies
+is acceptable when the use case warrants it. Always update
+`pyproject.toml` `[project.dependencies]`.
+
+---
+
+## Out-of-scope discoveries
+
+If you find a bug or stale content while working on a different task:
+- **Small fix:** include it in the same PR.
+- **Large fix:** open a separate issue.
+
+---
+
+## New-module checklist
+
+When adding a new module to `chainweaver/`:
+
+1. Check the reserved-name list in [architecture.md § Planned modules](architecture.md#planned-modules).
+2. Add `from __future__ import annotations` as the first code line.
+3. Add type annotations to all function signatures.
+4. Export public symbols in `chainweaver/__init__.py` `__all__`.
+5. Add the module to the AGENTS.md repository map.
+6. Add the module to the `architecture.md` module-boundaries table.
+7. Create tests in `tests/test_{module}.py`.
+8. Verify all four validation commands pass.
+9. Update `pyproject.toml` if new dependencies are needed.
+10. Update the common-tasks table in AGENTS.md if the module introduces
+    a new recurring task pattern.
+
+---
+
+## Documentation governance triggers
+
+| Trigger | Required action |
+|---------|-----------------|
+| Add/remove/rename module | Update AGENTS.md repo map + architecture.md boundaries |
+| Change coding conventions | Update workflows.md code style section |
+| Change CI pipeline | Update workflows.md CI section |
+| Add a new exception | Update AGENTS.md common tasks + README error table |
+| Discover a recurring agent mistake | Record in lessons-learned.md |
+| Change review expectations | Update review-checklist.md |
+| Find a contradiction between docs | Fix in same PR if small; open issue if large |

From dbedc4966a3d6d3e70cbf07e9baf341619cbcef5 Mon Sep 17 00:00:00 2001
From: dgenio <diogo.ansantos@nos.pt>
Date: Mon, 9 Mar 2026 13:46:51 +0000
Subject: [PATCH 2/2] docs: fix vocabulary in planned modules table (chain 
 flow)

---
 docs/agent-context/architecture.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/agent-context/architecture.md b/docs/agent-context/architecture.md
index 156a81c..108ee39 100644
--- a/docs/agent-context/architecture.md
+++ b/docs/agent-context/architecture.md
@@ -76,9 +76,9 @@ files that conflict with these names:
 
 | Reserved name | Issue | Purpose |
 |---------------|-------|---------|
-| `compiler.py` | #71 | Compile-time schema chain validation |
-| `analyzer.py` | #77 | Offline chain analyzer |
-| `observer.py` | #78 | Runtime chain observer |
+| `compiler.py` | #71 | Compile-time schema flow validation |
+| `analyzer.py` | #77 | Offline flow analyzer |
+| `observer.py` | #78 | Runtime flow observer |
 | `compat.py` | #48 | Schema fingerprinting |
 | `viz.py` | #79 | Flow visualization |
 | `cli.py` | #44 | CLI interface |