Self-Improving Debug Loop

A demonstration of a self-improving debugging loop built with TypeScript and LangGraph. This simplified implementation shows how an AI agent can detect its own errors, diagnose root causes, and attempt corrections autonomously.

✅ Core Capabilities Demonstrated

Capability	Implementation
1. Error Detection	LLM-based validator with explicit constraint checking - not blind retry
2. Diagnosis	Classifies error types (LOGIC_ERROR, EDGE_CASE, MISSING_IMPLEMENTATION, etc.) with root cause analysis
3. Correction	Selects targeted strategy (FIX_LOGIC, ADD_EDGE_CASE, REFACTOR) - correction differs from original
4. Learning (Bonus)	JSON-based memory persists fixes; fuzzy matching recalls past learnings

Features

Self-Healing: Automatically detects, diagnoses, and fixes errors
Memory System: Learns from past fixes and applies them to similar tasks
Error Taxonomy: 8 error types, 8 correction strategies (not catch-all retry)
Telemetry: Rich console output or JSON mode for debugging
Max 3 Retries: Prevents infinite loops

Architecture

The system uses a LangGraph state machine with 10 specialized nodes:

Flow Overview:

Success Path: Task Intake → Memory Recall → Executor → Validator → Output
Error Path: Validator → Diagnoser → Correction Selector → Re-executor → Validator (retry up to 3 times)
Learning: After successful correction, Memory Writer saves the fix for future recall

Design Notes:

Uses a weaker model for execution to demonstrate self-healing capabilities
Stronger model for diagnosis to ensure accurate error classification
Deterministic correction selector (no LLM) for predictable strategy selection
Simple fuzzy matching (LCS algorithm) for recalling similar past tasks

Models Used

Node	Model	Purpose
Executor	gpt-5.4-nano	Generate task output (weaker model to show self-healing)
Diagnoser	gpt-5.3-codex	Classify errors accurately
Validator	gpt-5.4-mini	Judge output quality

Setup

Create .env file with your OpenAI API key:
```
OPENAI_API_KEY=your-key-here
```
Install dependencies:
```
npm install
```

Usage

Run a task:

npm run dev -- "Generate a factorial function"

Run with debug mode (JSON output):

npm run dev -- --debug "Write a function to reverse a string"

Clear memory:

npm run dev -- --clear-memory

Run the demo:

npm run demo

How It Works

The system demonstrates real self-correction, not blind retries:

Error Detection: The validator explicitly checks each constraint and identifies specific missing features (not just "it failed, try again")
Root Cause Diagnosis: The diagnoser classifies errors into 8 types with detailed analysis
Targeted Correction: Each error type maps to a specific correction strategy - the retry prompt is different from the original
Persistent Learning: Successful fixes are saved to memory and recalled for similar future tasks

Demo Output

The demo runs 3 challenging tasks to showcase different error types:

Expression Parser - Tests edge case handling (nested parens, whitespace)
Rate Limiter - Tests complex logic with sliding windows and state
Schema Validator - Tests missing implementation (intentionally fails first, then self-corrects)

Expected behavior:

First Run: Task 3 fails validation → diagnosed as MISSING_IMPLEMENTATION → adds validation logic → succeeds
Second Run: Same task passes on first try using saved memory

Example output:

╭─────────────────────────────────────────────╮
│  🚀 Self-Improving Debug Loop               │
╰─────────────────────────────────────────────╯

[TASK] Generate a function to calculate factorial

[MEMORY] No matching past fixes found

[EXECUTOR] Generating with gpt-5.3-codex...
✓ Output generated

[VALIDATOR] Checking output...
✗ Validation failed: Missing edge case for n=0

[DIAGNOSER] Analyzing error...
  → Error type: EDGE_CASE
  → Root cause: No base case handling

[CORRECTION] Strategy: ADD_EDGE_CASE
[RETRY 1/3] Re-executing with modified prompt...

[VALIDATOR] Checking output...
✓ All checks passed!

[MEMORY] Saved fix for future reference (id: fix_001)

╭─────────────────────────────────────────────╮
│  ✅ Task completed in 2 attempt(s)          │
│  📝 Memory updated for faster future runs   │
╰─────────────────────────────────────────────╯

Project Structure

├── src/
│   ├── index.ts           # CLI entry point
│   ├── graph.ts           # LangGraph workflow
│   ├── state.ts           # State types & error taxonomy
│   ├── nodes/             # Graph nodes
│   │   ├── executor.ts        # Initial code generation
│   │   ├── validator.ts       # Output validation
│   │   ├── diagnoser.ts       # Error classification
│   │   ├── correctionSelector.ts  # Strategy selection
│   │   ├── reExecutor.ts      # Retry with corrections
│   │   ├── memoryRecall.ts    # Search past fixes
│   │   └── memoryWriter.ts    # Persist learnings
│   ├── llm/               # OpenAI client
│   ├── memory/            # JSON memory store
│   │   ├── store.ts          # File I/O
│   │   └── matcher.ts        # Fuzzy similarity (LCS)
│   └── telemetry/         # Logger & events
├── examples/
│   ├── demo.ts            # Demo script (3 tasks)
│   └── hardTasks.ts       # Challenging demo tasks
├── memory.json            # Persistent memory
└── .env                   # API key (not committed)

Error Taxonomy

Error Type	Description	Correction Strategy
`SYNTAX_ERROR`	Code syntax issues	`FIX_SYNTAX`
`LOGIC_ERROR`	Incorrect algorithm	`FIX_LOGIC`
`EDGE_CASE`	Missing edge handling	`ADD_EDGE_CASE`
`MISSING_IMPLEMENTATION`	Incomplete code	`COMPLETE_IMPLEMENTATION`
`VALIDATION_ERROR`	Doesn't meet requirements	`REFACTOR`
`TYPE_ERROR`	Type mismatches	`ADD_TYPE_CHECKS`
`RUNTIME_ERROR`	Would fail at runtime	`ADD_ERROR_HANDLING`

Implementation Simplifications

This is a simplified demonstration. A production self-improving system would require:

1. Memory System

Demo: Simple JSON file with text matching (handles ~20-50 records)
Production: Vector embeddings for semantic search across thousands of past fixes

2. Tool Integration

Demo: Code generation only
Production: Full tooling ecosystem (web browsing, file I/O, sandboxed execution, API calls)

3. Error Recovery

Demo: Single-level correction (fix generated code)
Production: Multi-layer recovery handling tool failures, reasoning errors, and planning mistakes

4. Validation

Demo: Single LLM-based validator
Production: Multi-validator consensus with actual code execution in isolated environments

5. Task Handling

Demo: Single-step task execution
Production: Multi-step planning with checkpointing and partial recovery mechanisms

6. Observability

Demo: Console logging
Production: Comprehensive telemetry, error analytics, and A/B testing of correction strategies

What This Demonstrates

✅ Basic self-correction workflow (detect → diagnose → correct)
✅ Error classification with root cause analysis
✅ Targeted correction strategies (not generic retry)
✅ Simple memory system for learning from past fixes
✅ Observable debugging loop with telemetry

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
memory.json		memory.json
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
workflow.excalidraw		workflow.excalidraw
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Improving Debug Loop

✅ Core Capabilities Demonstrated

Features

Architecture

Models Used

Setup

Usage

Run a task:

Run with debug mode (JSON output):

Clear memory:

Run the demo:

How It Works

Demo Output

Project Structure

Error Taxonomy

Implementation Simplifications

1. Memory System

2. Tool Integration

3. Error Recovery

4. Validation

5. Task Handling

6. Observability

What This Demonstrates

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Self-Improving Debug Loop

✅ Core Capabilities Demonstrated

Features

Architecture

Models Used

Setup

Usage

Run a task:

Run with debug mode (JSON output):

Clear memory:

Run the demo:

How It Works

Demo Output

Project Structure

Error Taxonomy

Implementation Simplifications

1. Memory System

2. Tool Integration

3. Error Recovery

4. Validation

5. Task Handling

6. Observability

What This Demonstrates

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages