Skip to content

Autonomous ML agent for running experiments using Claude or Codex.

License

Notifications You must be signed in to change notification settings

pentoai/ml-ralph

Repository files navigation

ML-Ralph

ML-Ralph

An autonomous ML agent that thinks like an experienced MLE. It works through a cognitive loop: ORIENT → RESEARCH → HYPOTHESIZE → EXECUTE → ANALYZE → VALIDATE → DECIDE.

Requirements

Install

pip install ml-ralph

Or with uv:

uv tool install --editable .

Quick Start

1. Initialize Ralph in your project

ml-ralph init

2. Create a PRD with Claude Code

Open Claude Code and use the /ml-ralph skill:

/ml-ralph

Ralph will ask clarifying questions to understand your ML problem and create a PRD.

3. Run the Autonomous Loop

ml-ralph run

Ralph works through the cognitive loop until success criteria are met.

Project Structure

After running Ralph, your project will have:

your-project/
├── .ml-ralph/
│   ├── RALPH.md           # Agent instructions
│   ├── prd.json           # PRD (the contract)
│   ├── ralph.json         # Execution state
│   ├── backlog.json       # Hypotheses queue
│   ├── log.jsonl          # Thinking log (research, learnings, analysis)
├── .claude/skills/ml-ralph/
├── .codex/skills/ml-ralph/
├── CLAUDE.md
└── AGENTS.md

Commands

Command Purpose
ml-ralph init Initialize Ralph in current project
ml-ralph run Run autonomous execution loop

Options

# Use Claude Code (default)
ml-ralph run --tool claude

# Use OpenAI Codex
ml-ralph run --tool codex

# Codex with custom sandbox mode (default: workspace-write)
ml-ralph run --tool codex --sandbox danger-full-access

# Set max iterations (default: 100)
ml-ralph run --max-iterations 200

# Force overwrite on init
ml-ralph init --force

Codex Sandbox Modes

When using --tool codex, you can control the sandbox policy:

Mode Description
read-only Agent can only read files, not modify
workspace-write Agent can modify files in workspace (default)
danger-full-access Full system access (use with caution)

The Cognitive Loop

ORIENT → RESEARCH → HYPOTHESIZE → EXECUTE → ANALYZE → VALIDATE → DECIDE
                         ↑                                         │
                         └─────────────────────────────────────────┘
  • ORIENT: Understand the problem, constraints, failure modes
  • RESEARCH: Learn from existing knowledge, find SOTA approaches
  • HYPOTHESIZE: Form testable bets with expected outcomes
  • EXECUTE: Implement minimal changes, run experiments
  • ANALYZE: Understand results, examine failures, find patterns
  • VALIDATE: Check for leakage, ensure results are trustworthy
  • DECIDE: Keep/revert/pivot based on evidence