This document summarizes the test coverage for SecAI_OS across all languages and test categories.
Last updated: 2026-03-14
Canonical source of truth for test counts:
docs/test-counts.json. CI enforces that actual counts never drift below documented values.
| Language | Test Count | Runner |
|---|---|---|
| Go | 402 | go test -race ./... |
| Python | 739 | pytest |
| Shell | All .sh files | shellcheck |
| Service | Location | Tests | Description |
|---|---|---|---|
| Registry | services/registry/ | 14 | Trusted model registry, hash pinning, cosign verification |
| Tool Firewall | services/tool-firewall/ | 10 | Default-deny egress policy, rule evaluation |
| Airlock | services/airlock/ | 10 | Online airlock, request sanitization, policy enforcement |
| GPU Integrity Watch | services/gpu-integrity-watch/ | 62 | GPU probe scoring, baseline comparison, action triggers, daemon mode, driver fingerprint, device allowlist, attestor/incident integration |
| MCP Firewall | services/mcp-firewall/ | 71 | MCP tool call policy enforcement, input redaction, taint tracking, audit, adversarial tests (M43), trust tier isolation, session binding |
| Policy Engine | services/policy-engine/ | 44 | Unified policy decisions across 6 domains, evidence generation, auth, adversarial tests (M43) |
| Runtime Attestor | services/runtime-attestor/ | 55 | TPM2 quote verification, HMAC bundles, state machine, startup gating, service digests, incident-recorder integration |
| Integrity Monitor | services/integrity-monitor/ | 50 | Baseline computation, continuous scanning, violation detection, state machine, HMAC baselines, incident-recorder integration |
| Incident Recorder | services/incident-recorder/ | 86 | Incident creation, auto-containment, lifecycle management, severity ranking, policy loading, containment execution, enforcement chain integration, recovery ceremony, severity escalation, forensic bundle export (M43), persistence durability (fsync) |
| Test File | Location | Tests | Description |
|---|---|---|---|
| test_pipeline.py | tests/ | 96 | Quarantine pipeline stages, scanning, pass/fail logic |
| test_search.py | tests/ | 27 | Search mediator, PII stripping, injection detection |
| test_ui.py | tests/ | 18 | Flask web UI routes, rendering, input handling, model catalog loading (YAML/fallback) |
| test_circuit_breaker.py | tests/ | 15 | Circuit breaker state machine (closed/open/half-open), reset, error propagation |
| test_vault_watchdog.py | tests/ | 18 | Vault auto-lock, idle detection, timer controls |
| test_memory_protection.py | tests/ | 37 | Swap encryption, zswap, core dumps, mlock, TEE detection |
| test_traffic_analysis.py | tests/ | 41 | Padding, timing jitter, dummy traffic generation |
| test_differential_privacy.py | tests/ | 37 | Privacy-preserving query obfuscation: decoy queries, k-anonymity, timing randomization |
| test_clipboard_isolation.py | tests/ | 30 | Clipboard access controls, content sanitization |
| test_canary_tripwire.py | tests/ | 49 | Canary token placement, tripwire monitoring, alerts |
| test_emergency_wipe.py | tests/ | 65 | 3-level panic wipe, secure deletion, escalation |
| test_update_rollback.py | tests/ | 74 | Signed update verification, rollback triggers, recovery |
| test_agent.py | tests/ | 159 | Agent policy engine, capability tokens (HMAC signing, nonce replay, expiry), storage gateway, budgets, planner, executor, API, workspace validation, security invariants, two-phase approval, policy evidence, keystore abstraction (software/TPM2/PKCS#11) |
| test_adversarial.py | tests/ | 28 | Prompt injection, policy bypass, step signature tampering, containment determinism, GPU runtime tamper, blocked paths (M43) |
| test_m5_acceptance.py | tests/ | 32 | M5 acceptance certification: attestation, integrity, policy, key management, replay resistance, MCP taint, adversarial regression, supply chain, recovery, workspace isolation, step signatures (M43) |
| Class | Tests | Category | Description |
|---|---|---|---|
| TestClassifyRisk | 3 | Unit | Risk-level classification for agent actions |
| TestPolicyEngine | 15 | Unit / Security | Deny-by-default evaluation, always-deny invariants, hard-approval gates |
| TestCapabilityTokens | 8 | Unit | Token creation, workspace scoping, mode-specific capabilities |
| TestBudgets | 7 | Unit | Budget enforcement, limit checking, sensitive-mode tighter limits |
| TestStorageGateway | 14 | Unit / Security | Path scope validation, sensitive file blocking, sensitivity ceiling, file size limits |
| TestPlannerHeuristic | 8 | Unit | Heuristic plan decomposition, keyword-to-action mapping |
| TestPlannerLLMParsing | 4 | Unit | LLM response parsing, malformed plan rejection |
| TestExecutor | 6 | Integration | Step execution dispatch, tool firewall calls, budget tracking |
| TestAgentAPI | 17 | Integration | HTTP endpoint contracts, input validation, task CRUD lifecycle, workspace ID resolution |
| TestSecurityInvariants | 7 | Security | Fail-closed behavior, airlock/firewall bypass prevention, service-down handling |
| TestDataModels | 4 | Unit | Task/step serialisation, status enum coverage |
| TestTokenSigning | 10 | Security | HMAC-SHA256 token signing, tamper detection, replay protection, expiry enforcement |
| TestTokenBinding | 8 | Security | Intent hashing, policy digest, task context binding, token-to-dict serialisation |
| TestTwoPhaseApproval | 6 | Security | Two-phase approval for high-risk actions (trust change, export, widen scope) |
| TestPolicyEvidence | 8 | Security | Per-step PolicyDecision evidence, risk classification, token validity tracking |
| TestVerifiedSupervisorAPI | 3 | Integration | Signed tokens in API responses, policy decisions in step params |
| TestSoftwareKeyProvider | 13 | Unit / Security | Software key provider: sign/verify, key rotation, file persistence, key derivation |
| TestTPM2KeyProvider | 5 | Unit | TPM2 provider: graceful degradation, PCR config, missing file handling |
| TestPKCS11KeyProvider | 6 | Unit | PKCS#11 stub: NotImplementedError for all operations, status reporting |
| TestKeystoreFactory | 7 | Integration | Provider factory, config loading, auto-detection, fallback chain |
All shell scripts under files/system/ are validated with shellcheck. This is enforced in CI.
CI is defined in .github/workflows/ci.yml and runs on every push and pull request.
Steps:
- Build and test all 9 Go services (
go test -race ./...) - Lint Python (py_compile for all service modules including agent)
- Run Python tests (
pytest tests/) — includes agent tests - Lint shell scripts with shellcheck
- Validate YAML configs (policy, agent, recipes)
- Verify action pins (SHA-256 pinned GitHub Actions)
- Supply chain verification: SBOM generation (Syft), cosign availability, release workflow provenance validation
| Category | Description | Examples |
|---|---|---|
| Unit | Isolated function/method tests | Hash verification, policy rule parsing |
| Integration | Multi-component interaction tests | Pipeline stage sequencing, service auth flow |
| Security | Validates security invariants hold | Injection detection, PII stripping, fail-closed behavior |
cd services/registry && go test ./...
cd services/tool-firewall && go test ./...
cd services/airlock && go test ./...
cd services/gpu-integrity-watch && go test ./...
cd services/mcp-firewall && go test ./...
cd services/policy-engine && go test ./...
cd services/runtime-attestor && go test ./...
cd services/integrity-monitor && go test ./...
cd services/incident-recorder && go test ./...pip install pytest flask requests pyyaml
pytest tests/To run a specific test file:
pytest tests/test_pipeline.py
pytest tests/test_search.py
pytest tests/test_agent.pyshellcheck files/system/usr/libexec/secure-ai/*.sh