Test Coverage Matrix

This document summarizes the test coverage for SecAI_OS across all languages and test categories.

Last updated: 2026-03-14

Canonical source of truth for test counts: docs/test-counts.json. CI enforces that actual counts never drift below documented values.

Summary

Language	Test Count	Runner
Go	402	`go test -race ./...`
Python	739	`pytest`
Shell	All .sh files	`shellcheck`

Go Tests (402 total)

Service	Location	Tests	Description
Registry	services/registry/	14	Trusted model registry, hash pinning, cosign verification
Tool Firewall	services/tool-firewall/	10	Default-deny egress policy, rule evaluation
Airlock	services/airlock/	10	Online airlock, request sanitization, policy enforcement
GPU Integrity Watch	services/gpu-integrity-watch/	62	GPU probe scoring, baseline comparison, action triggers, daemon mode, driver fingerprint, device allowlist, attestor/incident integration
MCP Firewall	services/mcp-firewall/	71	MCP tool call policy enforcement, input redaction, taint tracking, audit, adversarial tests (M43), trust tier isolation, session binding
Policy Engine	services/policy-engine/	44	Unified policy decisions across 6 domains, evidence generation, auth, adversarial tests (M43)
Runtime Attestor	services/runtime-attestor/	55	TPM2 quote verification, HMAC bundles, state machine, startup gating, service digests, incident-recorder integration
Integrity Monitor	services/integrity-monitor/	50	Baseline computation, continuous scanning, violation detection, state machine, HMAC baselines, incident-recorder integration
Incident Recorder	services/incident-recorder/	86	Incident creation, auto-containment, lifecycle management, severity ranking, policy loading, containment execution, enforcement chain integration, recovery ceremony, severity escalation, forensic bundle export (M43), persistence durability (fsync)

Python Tests (739 total)

Test File	Location	Tests	Description
test_pipeline.py	tests/	96	Quarantine pipeline stages, scanning, pass/fail logic
test_search.py	tests/	27	Search mediator, PII stripping, injection detection
test_ui.py	tests/	18	Flask web UI routes, rendering, input handling, model catalog loading (YAML/fallback)
test_circuit_breaker.py	tests/	15	Circuit breaker state machine (closed/open/half-open), reset, error propagation
test_vault_watchdog.py	tests/	18	Vault auto-lock, idle detection, timer controls
test_memory_protection.py	tests/	37	Swap encryption, zswap, core dumps, mlock, TEE detection
test_traffic_analysis.py	tests/	41	Padding, timing jitter, dummy traffic generation
test_differential_privacy.py	tests/	37	Privacy-preserving query obfuscation: decoy queries, k-anonymity, timing randomization
test_clipboard_isolation.py	tests/	30	Clipboard access controls, content sanitization
test_canary_tripwire.py	tests/	49	Canary token placement, tripwire monitoring, alerts
test_emergency_wipe.py	tests/	65	3-level panic wipe, secure deletion, escalation
test_update_rollback.py	tests/	74	Signed update verification, rollback triggers, recovery
test_agent.py	tests/	159	Agent policy engine, capability tokens (HMAC signing, nonce replay, expiry), storage gateway, budgets, planner, executor, API, workspace validation, security invariants, two-phase approval, policy evidence, keystore abstraction (software/TPM2/PKCS#11)
test_adversarial.py	tests/	28	Prompt injection, policy bypass, step signature tampering, containment determinism, GPU runtime tamper, blocked paths (M43)
test_m5_acceptance.py	tests/	32	M5 acceptance certification: attestation, integrity, policy, key management, replay resistance, MCP taint, adversarial regression, supply chain, recovery, workspace isolation, step signatures (M43)

Agent test breakdown (test_agent.py)

Class	Tests	Category	Description
TestClassifyRisk	3	Unit	Risk-level classification for agent actions
TestPolicyEngine	15	Unit / Security	Deny-by-default evaluation, always-deny invariants, hard-approval gates
TestCapabilityTokens	8	Unit	Token creation, workspace scoping, mode-specific capabilities
TestBudgets	7	Unit	Budget enforcement, limit checking, sensitive-mode tighter limits
TestStorageGateway	14	Unit / Security	Path scope validation, sensitive file blocking, sensitivity ceiling, file size limits
TestPlannerHeuristic	8	Unit	Heuristic plan decomposition, keyword-to-action mapping
TestPlannerLLMParsing	4	Unit	LLM response parsing, malformed plan rejection
TestExecutor	6	Integration	Step execution dispatch, tool firewall calls, budget tracking
TestAgentAPI	17	Integration	HTTP endpoint contracts, input validation, task CRUD lifecycle, workspace ID resolution
TestSecurityInvariants	7	Security	Fail-closed behavior, airlock/firewall bypass prevention, service-down handling
TestDataModels	4	Unit	Task/step serialisation, status enum coverage
TestTokenSigning	10	Security	HMAC-SHA256 token signing, tamper detection, replay protection, expiry enforcement
TestTokenBinding	8	Security	Intent hashing, policy digest, task context binding, token-to-dict serialisation
TestTwoPhaseApproval	6	Security	Two-phase approval for high-risk actions (trust change, export, widen scope)
TestPolicyEvidence	8	Security	Per-step PolicyDecision evidence, risk classification, token validity tracking
TestVerifiedSupervisorAPI	3	Integration	Signed tokens in API responses, policy decisions in step params
TestSoftwareKeyProvider	13	Unit / Security	Software key provider: sign/verify, key rotation, file persistence, key derivation
TestTPM2KeyProvider	5	Unit	TPM2 provider: graceful degradation, PCR config, missing file handling
TestPKCS11KeyProvider	6	Unit	PKCS#11 stub: NotImplementedError for all operations, status reporting
TestKeystoreFactory	7	Integration	Provider factory, config loading, auto-detection, fallback chain

Shell Checks

All shell scripts under files/system/ are validated with shellcheck. This is enforced in CI.

CI Pipeline

CI is defined in .github/workflows/ci.yml and runs on every push and pull request.

Steps:

Build and test all 9 Go services (go test -race ./...)
Lint Python (py_compile for all service modules including agent)
Run Python tests (pytest tests/) — includes agent tests
Lint shell scripts with shellcheck
Validate YAML configs (policy, agent, recipes)
Verify action pins (SHA-256 pinned GitHub Actions)
Supply chain verification: SBOM generation (Syft), cosign availability, release workflow provenance validation

Test Categories

Category	Description	Examples
Unit	Isolated function/method tests	Hash verification, policy rule parsing
Integration	Multi-component interaction tests	Pipeline stage sequencing, service auth flow
Security	Validates security invariants hold	Injection detection, PII stripping, fail-closed behavior

Running Tests Locally

Go tests

cd services/registry && go test ./...
cd services/tool-firewall && go test ./...
cd services/airlock && go test ./...
cd services/gpu-integrity-watch && go test ./...
cd services/mcp-firewall && go test ./...
cd services/policy-engine && go test ./...
cd services/runtime-attestor && go test ./...
cd services/integrity-monitor && go test ./...
cd services/incident-recorder && go test ./...

Python tests

pip install pytest flask requests pyyaml
pytest tests/

To run a specific test file:

pytest tests/test_pipeline.py
pytest tests/test_search.py
pytest tests/test_agent.py

Shell checks

shellcheck files/system/usr/libexec/secure-ai/*.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Coverage Matrix

Summary

Go Tests (402 total)

Python Tests (739 total)

Agent test breakdown (test_agent.py)

Shell Checks

CI Pipeline

Test Categories

Running Tests Locally

Go tests

Python tests

Shell checks

FilesExpand file tree

test-matrix.md

Latest commit

History

test-matrix.md

File metadata and controls

Test Coverage Matrix

Summary

Go Tests (402 total)

Python Tests (739 total)

Agent test breakdown (test_agent.py)

Shell Checks

CI Pipeline

Test Categories

Running Tests Locally

Go tests

Python tests

Shell checks