A demonstration of a self-improving debugging loop built with TypeScript and LangGraph. This simplified implementation shows how an AI agent can detect its own errors, diagnose root causes, and attempt corrections autonomously.
| Capability | Implementation |
|---|---|
| 1. Error Detection | LLM-based validator with explicit constraint checking - not blind retry |
| 2. Diagnosis | Classifies error types (LOGIC_ERROR, EDGE_CASE, MISSING_IMPLEMENTATION, etc.) with root cause analysis |
| 3. Correction | Selects targeted strategy (FIX_LOGIC, ADD_EDGE_CASE, REFACTOR) - correction differs from original |
| 4. Learning (Bonus) | JSON-based memory persists fixes; fuzzy matching recalls past learnings |
- Self-Healing: Automatically detects, diagnoses, and fixes errors
- Memory System: Learns from past fixes and applies them to similar tasks
- Error Taxonomy: 8 error types, 8 correction strategies (not catch-all retry)
- Telemetry: Rich console output or JSON mode for debugging
- Max 3 Retries: Prevents infinite loops
The system uses a LangGraph state machine with 10 specialized nodes:
Flow Overview:
- Success Path: Task Intake → Memory Recall → Executor → Validator → Output
- Error Path: Validator → Diagnoser → Correction Selector → Re-executor → Validator (retry up to 3 times)
- Learning: After successful correction, Memory Writer saves the fix for future recall
Design Notes:
- Uses a weaker model for execution to demonstrate self-healing capabilities
- Stronger model for diagnosis to ensure accurate error classification
- Deterministic correction selector (no LLM) for predictable strategy selection
- Simple fuzzy matching (LCS algorithm) for recalling similar past tasks
| Node | Model | Purpose |
|---|---|---|
| Executor | gpt-5.4-nano | Generate task output (weaker model to show self-healing) |
| Diagnoser | gpt-5.3-codex | Classify errors accurately |
| Validator | gpt-5.4-mini | Judge output quality |
-
Create
.envfile with your OpenAI API key:OPENAI_API_KEY=your-key-here -
Install dependencies:
npm install
npm run dev -- "Generate a factorial function"npm run dev -- --debug "Write a function to reverse a string"npm run dev -- --clear-memorynpm run demoThe system demonstrates real self-correction, not blind retries:
- Error Detection: The validator explicitly checks each constraint and identifies specific missing features (not just "it failed, try again")
- Root Cause Diagnosis: The diagnoser classifies errors into 8 types with detailed analysis
- Targeted Correction: Each error type maps to a specific correction strategy - the retry prompt is different from the original
- Persistent Learning: Successful fixes are saved to memory and recalled for similar future tasks
The demo runs 3 challenging tasks to showcase different error types:
- Expression Parser - Tests edge case handling (nested parens, whitespace)
- Rate Limiter - Tests complex logic with sliding windows and state
- Schema Validator - Tests missing implementation (intentionally fails first, then self-corrects)
Expected behavior:
- First Run: Task 3 fails validation → diagnosed as MISSING_IMPLEMENTATION → adds validation logic → succeeds
- Second Run: Same task passes on first try using saved memory
Example output:
╭─────────────────────────────────────────────╮
│ 🚀 Self-Improving Debug Loop │
╰─────────────────────────────────────────────╯
[TASK] Generate a function to calculate factorial
[MEMORY] No matching past fixes found
[EXECUTOR] Generating with gpt-5.3-codex...
✓ Output generated
[VALIDATOR] Checking output...
✗ Validation failed: Missing edge case for n=0
[DIAGNOSER] Analyzing error...
→ Error type: EDGE_CASE
→ Root cause: No base case handling
[CORRECTION] Strategy: ADD_EDGE_CASE
[RETRY 1/3] Re-executing with modified prompt...
[VALIDATOR] Checking output...
✓ All checks passed!
[MEMORY] Saved fix for future reference (id: fix_001)
╭─────────────────────────────────────────────╮
│ ✅ Task completed in 2 attempt(s) │
│ 📝 Memory updated for faster future runs │
╰─────────────────────────────────────────────╯
├── src/
│ ├── index.ts # CLI entry point
│ ├── graph.ts # LangGraph workflow
│ ├── state.ts # State types & error taxonomy
│ ├── nodes/ # Graph nodes
│ │ ├── executor.ts # Initial code generation
│ │ ├── validator.ts # Output validation
│ │ ├── diagnoser.ts # Error classification
│ │ ├── correctionSelector.ts # Strategy selection
│ │ ├── reExecutor.ts # Retry with corrections
│ │ ├── memoryRecall.ts # Search past fixes
│ │ └── memoryWriter.ts # Persist learnings
│ ├── llm/ # OpenAI client
│ ├── memory/ # JSON memory store
│ │ ├── store.ts # File I/O
│ │ └── matcher.ts # Fuzzy similarity (LCS)
│ └── telemetry/ # Logger & events
├── examples/
│ ├── demo.ts # Demo script (3 tasks)
│ └── hardTasks.ts # Challenging demo tasks
├── memory.json # Persistent memory
└── .env # API key (not committed)
| Error Type | Description | Correction Strategy |
|---|---|---|
SYNTAX_ERROR |
Code syntax issues | FIX_SYNTAX |
LOGIC_ERROR |
Incorrect algorithm | FIX_LOGIC |
EDGE_CASE |
Missing edge handling | ADD_EDGE_CASE |
MISSING_IMPLEMENTATION |
Incomplete code | COMPLETE_IMPLEMENTATION |
VALIDATION_ERROR |
Doesn't meet requirements | REFACTOR |
TYPE_ERROR |
Type mismatches | ADD_TYPE_CHECKS |
RUNTIME_ERROR |
Would fail at runtime | ADD_ERROR_HANDLING |
This is a simplified demonstration. A production self-improving system would require:
- Demo: Simple JSON file with text matching (handles ~20-50 records)
- Production: Vector embeddings for semantic search across thousands of past fixes
- Demo: Code generation only
- Production: Full tooling ecosystem (web browsing, file I/O, sandboxed execution, API calls)
- Demo: Single-level correction (fix generated code)
- Production: Multi-layer recovery handling tool failures, reasoning errors, and planning mistakes
- Demo: Single LLM-based validator
- Production: Multi-validator consensus with actual code execution in isolated environments
- Demo: Single-step task execution
- Production: Multi-step planning with checkpointing and partial recovery mechanisms
- Demo: Console logging
- Production: Comprehensive telemetry, error analytics, and A/B testing of correction strategies
- ✅ Basic self-correction workflow (detect → diagnose → correct)
- ✅ Error classification with root cause analysis
- ✅ Targeted correction strategies (not generic retry)
- ✅ Simple memory system for learning from past fixes
- ✅ Observable debugging loop with telemetry
MIT
