| title | description | icon |
|---|---|---|
Testing Guide |
Testing conventions, patterns, and best practices for Agent Control. |
flask-vial |
-
Make tests readable and reviewable.
-
Prefer verifying behavior through the public contract (user-facing surfaces) over internal details.
-
Keep the suite reliable: deterministic, minimal flake, clear failures.
-
Any behavior change should include a test change (add/adjust). Pure refactors only require test changes if behavior changes.
Public contract is anything exposed to end users and not an internal implementation detail.
Practical mapping in this repo:
-
Server: HTTP endpoints + request/response schemas (and documented behavior).
-
SDK: symbols exported from
sdks/python/src/agent_control/__init__.py(and documented behavior). -
Models: Pydantic models/fields/validation/serialization in
models/src/agent_control_models/. -
Engine: stable, intended entrypoints for evaluation behavior (avoid asserting on internal helpers/private module structure).
Choose the narrowest user-facing interface that can express the scenario:
-
Server behavior → drive via HTTP endpoints (create/setup via API where feasible).
-
SDK behavior → drive via exported SDK API.
-
Engine behavior → drive via the engine’s stable entrypoints.
-
Only if needed: test internal helpers for hard-to-reach edge cases or performance-sensitive parsing/validation.
Why this rule exists:
-
Contract tests survive refactors (less coupled to internals).
-
They catch integration mismatches between packages (models/server/sdk).
-
They better reflect how users experience failures.
When it’s OK to use internals:
-
The public route to set up state is disproportionately slow or complex.
-
You need to force an otherwise-unreachable error path.
-
You’re testing a pure function where the “public API” adds no value.
If you do use internals, say so explicitly in the test’s # Given: block (e.g., “Given: seeded DB row directly for speed”).
Use # Given, # When, # Then comments to separate intent from mechanics. This helps smaller models (and humans) avoid mixing setup/action/assertions, and makes tests easy to scan.
Guidelines:
-
Given: inputs, state, preconditions (fixtures/mocks/seed data).
-
When: the single action under test (call a function / make a request).
-
Then: assertions about outcomes (return value, error, side effects).
-
Prefer one When per test. If you need multiple actions, split tests unless the steps are inseparable.
-
Keep comments short and specific (often one line each).
Examples below are illustrative; adjust imports/names and fill in placeholders to match the concrete code under test.
def test_scope_rejects_invalid_step_name_regex() -> None:
# Given: a scope with an invalid regex
scope = {"step_name_regex": "("}
# When: constructing the model
with pytest.raises(ValueError):
ControlScope.model_validate(scope)
# Then: a clear validation error is raised (asserted by pytest)def test_create_control_returns_id(client: TestClient) -> None:
# Given: a valid control payload
payload = {"name": "pii-protection"}
# When: creating the control via the public API
response = client.put("/api/v1/controls", json=payload)
# Then: the response contains the control id
assert response.status_code == 200
assert "control_id" in response.json()async def test_sdk_denies_on_local_control() -> None:
# Given: an SDK client and a local deny control
client = AgentControlClient(base_url="http://localhost:8000")
controls = [{"execution": "sdk", "action": {"decision": "deny"}, ...}]
# When: evaluating via the SDK public API
result = await check_evaluation_with_local(
client=client,
agent_name="demo-agent",
step=Step(type="tool", name="db_query", input={"sql": "SELECT 1"}, output=None),
stage="pre",
controls=controls,
)
# Then: the evaluation is unsafe
assert result.is_safe is False-
Prefer creating records via public endpoints rather than writing DB rows directly.
-
Prefer invoking behavior via public entrypoints:
-
Server: HTTP endpoints (the service layer is internal; use it directly only when endpoint setup is impractical).
-
SDK: symbols exported from
sdks/python/src/agent_control/__init__.py.
-
-
Avoid asserting on internal/private fields unless they are part of the contract (schemas, response fields, documented behavior).
Prefer Makefile targets when available:
-
All tests:
make test -
Server tests:
make server-test -
Engine tests:
make engine-test -
SDK tests:
make sdk-test
If there is no Makefile target for a task (e.g., models tests), it’s OK to run the underlying command (e.g., cd models && uv run pytest).
Package-specific notes:
-
Server tests use a configured test database (see
server/Makefile; invoked viamake server-test). -
SDK tests start a local server and wait on
/health(invoked viamake sdk-test).