Skip to content

arnavgupta00/Agentic-Self-Improving-Debug-Loop-Workflow

Repository files navigation

Self-Improving Debug Loop

A demonstration of a self-improving debugging loop built with TypeScript and LangGraph. This simplified implementation shows how an AI agent can detect its own errors, diagnose root causes, and attempt corrections autonomously.

✅ Core Capabilities Demonstrated

Capability Implementation
1. Error Detection LLM-based validator with explicit constraint checking - not blind retry
2. Diagnosis Classifies error types (LOGIC_ERROR, EDGE_CASE, MISSING_IMPLEMENTATION, etc.) with root cause analysis
3. Correction Selects targeted strategy (FIX_LOGIC, ADD_EDGE_CASE, REFACTOR) - correction differs from original
4. Learning (Bonus) JSON-based memory persists fixes; fuzzy matching recalls past learnings

Features

  • Self-Healing: Automatically detects, diagnoses, and fixes errors
  • Memory System: Learns from past fixes and applies them to similar tasks
  • Error Taxonomy: 8 error types, 8 correction strategies (not catch-all retry)
  • Telemetry: Rich console output or JSON mode for debugging
  • Max 3 Retries: Prevents infinite loops

Architecture

The system uses a LangGraph state machine with 10 specialized nodes:

Self-Improving Debug Loop Workflow

Flow Overview:

  • Success Path: Task Intake → Memory Recall → Executor → Validator → Output
  • Error Path: Validator → Diagnoser → Correction Selector → Re-executor → Validator (retry up to 3 times)
  • Learning: After successful correction, Memory Writer saves the fix for future recall

Design Notes:

  • Uses a weaker model for execution to demonstrate self-healing capabilities
  • Stronger model for diagnosis to ensure accurate error classification
  • Deterministic correction selector (no LLM) for predictable strategy selection
  • Simple fuzzy matching (LCS algorithm) for recalling similar past tasks

Models Used

Node Model Purpose
Executor gpt-5.4-nano Generate task output (weaker model to show self-healing)
Diagnoser gpt-5.3-codex Classify errors accurately
Validator gpt-5.4-mini Judge output quality

Setup

  1. Create .env file with your OpenAI API key:

    OPENAI_API_KEY=your-key-here
    
  2. Install dependencies:

    npm install

Usage

Run a task:

npm run dev -- "Generate a factorial function"

Run with debug mode (JSON output):

npm run dev -- --debug "Write a function to reverse a string"

Clear memory:

npm run dev -- --clear-memory

Run the demo:

npm run demo

How It Works

The system demonstrates real self-correction, not blind retries:

  1. Error Detection: The validator explicitly checks each constraint and identifies specific missing features (not just "it failed, try again")
  2. Root Cause Diagnosis: The diagnoser classifies errors into 8 types with detailed analysis
  3. Targeted Correction: Each error type maps to a specific correction strategy - the retry prompt is different from the original
  4. Persistent Learning: Successful fixes are saved to memory and recalled for similar future tasks

Demo Output

The demo runs 3 challenging tasks to showcase different error types:

  1. Expression Parser - Tests edge case handling (nested parens, whitespace)
  2. Rate Limiter - Tests complex logic with sliding windows and state
  3. Schema Validator - Tests missing implementation (intentionally fails first, then self-corrects)

Expected behavior:

  • First Run: Task 3 fails validation → diagnosed as MISSING_IMPLEMENTATION → adds validation logic → succeeds
  • Second Run: Same task passes on first try using saved memory

Example output:

╭─────────────────────────────────────────────╮
│  🚀 Self-Improving Debug Loop               │
╰─────────────────────────────────────────────╯

[TASK] Generate a function to calculate factorial

[MEMORY] No matching past fixes found

[EXECUTOR] Generating with gpt-5.3-codex...
✓ Output generated

[VALIDATOR] Checking output...
✗ Validation failed: Missing edge case for n=0

[DIAGNOSER] Analyzing error...
  → Error type: EDGE_CASE
  → Root cause: No base case handling

[CORRECTION] Strategy: ADD_EDGE_CASE
[RETRY 1/3] Re-executing with modified prompt...

[VALIDATOR] Checking output...
✓ All checks passed!

[MEMORY] Saved fix for future reference (id: fix_001)

╭─────────────────────────────────────────────╮
│  ✅ Task completed in 2 attempt(s)          │
│  📝 Memory updated for faster future runs   │
╰─────────────────────────────────────────────╯

Project Structure

├── src/
│   ├── index.ts           # CLI entry point
│   ├── graph.ts           # LangGraph workflow
│   ├── state.ts           # State types & error taxonomy
│   ├── nodes/             # Graph nodes
│   │   ├── executor.ts        # Initial code generation
│   │   ├── validator.ts       # Output validation
│   │   ├── diagnoser.ts       # Error classification
│   │   ├── correctionSelector.ts  # Strategy selection
│   │   ├── reExecutor.ts      # Retry with corrections
│   │   ├── memoryRecall.ts    # Search past fixes
│   │   └── memoryWriter.ts    # Persist learnings
│   ├── llm/               # OpenAI client
│   ├── memory/            # JSON memory store
│   │   ├── store.ts          # File I/O
│   │   └── matcher.ts        # Fuzzy similarity (LCS)
│   └── telemetry/         # Logger & events
├── examples/
│   ├── demo.ts            # Demo script (3 tasks)
│   └── hardTasks.ts       # Challenging demo tasks
├── memory.json            # Persistent memory
└── .env                   # API key (not committed)

Error Taxonomy

Error Type Description Correction Strategy
SYNTAX_ERROR Code syntax issues FIX_SYNTAX
LOGIC_ERROR Incorrect algorithm FIX_LOGIC
EDGE_CASE Missing edge handling ADD_EDGE_CASE
MISSING_IMPLEMENTATION Incomplete code COMPLETE_IMPLEMENTATION
VALIDATION_ERROR Doesn't meet requirements REFACTOR
TYPE_ERROR Type mismatches ADD_TYPE_CHECKS
RUNTIME_ERROR Would fail at runtime ADD_ERROR_HANDLING

Implementation Simplifications

This is a simplified demonstration. A production self-improving system would require:

1. Memory System

  • Demo: Simple JSON file with text matching (handles ~20-50 records)
  • Production: Vector embeddings for semantic search across thousands of past fixes

2. Tool Integration

  • Demo: Code generation only
  • Production: Full tooling ecosystem (web browsing, file I/O, sandboxed execution, API calls)

3. Error Recovery

  • Demo: Single-level correction (fix generated code)
  • Production: Multi-layer recovery handling tool failures, reasoning errors, and planning mistakes

4. Validation

  • Demo: Single LLM-based validator
  • Production: Multi-validator consensus with actual code execution in isolated environments

5. Task Handling

  • Demo: Single-step task execution
  • Production: Multi-step planning with checkpointing and partial recovery mechanisms

6. Observability

  • Demo: Console logging
  • Production: Comprehensive telemetry, error analytics, and A/B testing of correction strategies

What This Demonstrates

  • ✅ Basic self-correction workflow (detect → diagnose → correct)
  • ✅ Error classification with root cause analysis
  • ✅ Targeted correction strategies (not generic retry)
  • ✅ Simple memory system for learning from past fixes
  • ✅ Observable debugging loop with telemetry

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors