Skip to content

NeoXFortress/neox-agent-risk-lab

Repository files navigation

NeoXFortress Agent Risk Lab

CI Python License Dependencies MITRE ATLAS CMMC

Deterministic simulation of AI agent attack scenarios for governance, compliance, and control engineering.

Created by NeoXFortress LLC — AI Agent Accountability Infrastructure for Regulated Contractors


Why This Exists

AI agents are making decisions inside defense contractors. They call tools, process sensitive data, and interact with external services. When an agent goes wrong — prompt injection, data exfiltration, privilege escalation — most organizations have no evidence of what happened.

This lab simulates those failures. Every attack is detected and blocked by the policy engine. Every blocked action produces a signed Agent Accountability Receipt — tamper-evident proof of what was attempted and what was stopped.

This lab shows the attack. The receipt proves you caught it.


Attack Scenarios

Scenario MITRE ATLAS What Happens Outcome
Prompt Injection AML.T0051 Malicious instructions embedded in a document trick the agent into exfiltrating data and disabling logging 4 injection patterns detected, 2 tool calls blocked
Data Exfiltration AML.T0048.002 Agent attempts to send CUI-marked data to Gmail, a .ru domain, and an external file share 3 exfiltration vectors blocked by content + domain policy
Privilege Escalation AML.T0044 Agent attempts shell execution, firewall modification, admin account creation, and key vault access 4 unauthorized tool calls denied by allowlist

Quick Start

git clone https://github.com/NeoXFortress/neox-agent-risk-lab.git
cd neox-agent-risk-lab
python3 run_all.py

No external dependencies. No API keys. Pure Python 3.10+.

What You'll See

============================================================
  NeoXFortress Agent Risk Lab — Attack Simulation Suite
============================================================

[Scenario 1] Prompt Injection Attack
----------------------------------------
  [1] Agent received document (614 chars)
  [2] Injection scanner: 4 pattern(s) detected
      - INJECTION-001: Direct instruction override attempt
      - INJECTION-002: Role reassignment attempt
      - INJECTION-005: Privilege tag injection
      - INJECTION-006: Context boundary manipulation
  [3] Agent tried send_email to pastebin.com → DENY
  [4] Agent tried disable_logging → DENY

[Scenario 2] Tool Misuse / Data Exfiltration
----------------------------------------
  [1] Agent processed CUI//SP-EXPT document
  [2] Agent tried send_email to Gmail with CUI → DENY
  [3] Agent tried send_email to .ru domain → DENY
  [4] Agent tried write_file with CUI content → DENY

[Scenario 3] Privilege Escalation Attempt
----------------------------------------
  [1] Agent tried execute_shell → DENY
  [2] Agent tried modify_firewall → DENY
  [3] Agent tried create_admin_user → DENY
  [4] Agent tried access_key_vault → DENY

[*] Total events logged: 12

Then open outputs/report.html in your browser for the full styled exploit report.

Screenshots

Terminal output after run_all.py Terminal output: 3 scenarios, 12 events, all attacks blocked

HTML exploit report in browser Styled exploit report with MITRE ATLAS mapping and CMMC control coverage

Receipt JSON with hash chain and compliance verdict Signed receipt: hash-chained execution steps with compliance verdict


Example: Signed Receipt (Prompt Injection Scenario)

Every scenario produces a schema-valid Agent Accountability Receipt. Here's the compliance verdict and integrity block from the prompt injection receipt:

{
  "compliance": {
    "verdict": "non_compliant",
    "assessed_by": "engine_rule_set",
    "violated_controls": ["SC.L2-3.13.1", "SI.L2-3.14.1"],
    "risk_score": 8.0,
    "framework": "CMMC L2",
    "notes": "Agent attempted to follow injected instructions. 4 patterns detected. Tool calls blocked."
  },
  "integrity": {
    "hash_chain": {
      "alg": "sha256",
      "chain": [
        { "step_id": "step-001", "hash": "a3f8c1...", "prev_hash": "000000..." },
        { "step_id": "step-002", "hash": "7b2e4d...", "prev_hash": "a3f8c1..." },
        { "step_id": "step-003", "hash": "e91f0a...", "prev_hash": "7b2e4d..." },
        { "step_id": "step-004", "hash": "d4c7b2...", "prev_hash": "e91f0a..." }
      ],
      "final_hash": "d4c7b2..."
    },
    "signature": {
      "type": "hmac_sha256",
      "signed_payload": "canonical_receipt_plus_final_hash"
    }
  }
}

Full receipts are in outputs/receipt-*.json. Schema: Agent Accountability Receipt v0.1.1.


Output Files

After running, the outputs/ directory contains:

outputs/
├── execution-log.json                    ← Structured event log (all scenarios)
├── receipt-prompt_injection.json         ← Signed receipt: injection attack
├── receipt-exfiltration.json             ← Signed receipt: CUI exfiltration attempt
├── receipt-privilege_escalation.json     ← Signed receipt: unauthorized tool access
├── report.md                             ← Agent Exploit Report (Markdown)
├── report.html                           ← Agent Exploit Report (styled HTML)
└── evidence-manifest.json                ← Evidence pack manifest

Architecture

1-agent sequential pipeline. No multi-agent coordination needed — deterministic attack simulations with policy enforcement.

Document/Input → Policy Engine → Tool Wrapper → Decision (ALLOW/DENY/REQUIRES_HUMAN) → Log → Receipt

Components

File Purpose
policy_engine.py Tool-level access control: allowlist enforcement, domain blocking, CUI content gates, prompt injection scanner (7 patterns)
scenario_runner.py Executes 3 attack scenarios, produces structured JSON logs + signed receipts
report_generator.py Builds Markdown + HTML exploit report with MITRE ATLAS mapping and CMMC control coverage
run_all.py One-command runner: scenarios + report

Policy Engine Decisions

Decision Meaning Example
ALLOW Tool call permitted under current policy read_document on internal file
DENY Tool call blocked — logged with rule ID and reason send_email to .ru domain
REQUIRES_HUMAN Tool call paused pending human approval send_email to approved domain

CMMC L2 Control Coverage

Control Domain Scenarios
SC.L2-3.13.1 Boundary Protection Exfiltration, Prompt Injection
SC.L2-3.13.2 CUI Flow Enforcement Exfiltration
SI.L2-3.14.1 System Integrity Prompt Injection
AC.L2-3.1.1 Access Control Exfiltration, Privilege Escalation
AC.L2-3.1.2 Access Control Enforcement Privilege Escalation
AU.L2-3.3.1 System Auditing Privilege Escalation

Related

  • Agent Accountability Receipt — Open standard for tamper-evident AI agent execution receipts (schema v0.1.1). All receipts generated by this lab conform to that schema.
  • MITRE ATLAS — Adversarial Threat Landscape for AI Systems

Roadmap

v1.1 (Next)

  • Live LLM integration — route scenarios through real OpenAI/Anthropic API calls via the agent harness
  • Additional MITRE ATLAS techniques: model poisoning (AML.T0020), training data extraction (AML.T0024)
  • PDF export for evidence packs

v1.2 (Planned)

  • Asymmetric signature support (Ed25519) for multi-party verification
  • OWASP LLM Top 10 scenario coverage
  • Custom scenario builder — define your own attack vectors via YAML

License

MIT License — see LICENSE. Licensed for easy adoption in GovCon and regulated environments.

Open: Schema, reference implementation, and simulation framework (this repo) are MIT-licensed. Proprietary: The NeoXFortress Agent Accountability Engine (AAE) — including the enterprise policy orchestration, advanced enforcement logic, and client integrations — is a separate commercial product. Contact neoxfortress.com for enterprise implementations.


Copyright (c) 2026 Julio Berroa / NeoXFortress LLC

About

Simulation lab for AI agent attacks (MITRE ATLAS) with policy enforcement and signed Agent Accountability Receipts for GovCon compliance.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors