Secure, cloud-sandboxed Recursive Language Models (RLM) with DSPy and Modal.
fleet-rlm gives AI agents a secure cloud sandbox for long-context code and document work, with a Web UI-first experience, recursive delegation, and DSPy-aligned tooling.
Paper | Docs | Contributing
Fastest path: install and launch the built-in Web UI.
# Install as a runnable CLI tool
uv tool install fleet-rlm
# Launch the Web UI server
fleet webOpen http://localhost:8000 in your browser.
- Prefer a regular environment install instead of
uv tool?
uv pip install fleet-rlm
fleet webfleet webis the primary interactive interface.- Product chat transport is WS-first (
/api/v1/ws/chat);POST /api/v1/chatis compatibility-only and deprecated (removal targetv0.4.93). - Plain
fleet-rlminstalls are intended to supportfleet web. - Runtime settings (LM / Modal) can be configured from the Web UI Settings surface in local development.
- Runtime model updates from Settings are hot-applied in-process (
/api/v1/runtime/settings) and verified via active model fields on/api/v1/runtime/status. - Secret settings inputs in the web Runtime UI are write-only; enter a new value to rotate, or use explicit clear-on-save.
- Full setup for Modal secrets, Neon DB, auth modes, and deployment is linked below.
- Chat with an RLM-powered agent in the browser (
fleet web) - Run recursive long-context tasks with a secure Modal sandbox
- Analyze documents (including PDF ingestion with MarkItDown/pypdf fallback)
- Stream execution events and trajectories for observability/debugging
- Expose capabilities as an MCP server (
fleet-rlm serve-mcp)
Common commands:
# Standalone terminal chat
fleet-rlm chat --trace-mode compact
# Explicit API server
fleet-rlm serve-api --port 8000
# FastAPI CLI (uses [tool.fastapi] entrypoint)
fastapi dev
fastapi run
# MCP server
fleet-rlm serve-mcp --transport stdio
# Scaffold assets for Claude Code
fleet-rlm init --listfleetstarts the standalone interactive chat launcher (Ink runtime path).fleet-rlm chatstarts the in-process terminal chat.- OpenTUI workflows and setup are documented in the guides (see links below) because they require additional local tooling.
# from repo root
uv sync --extra dev --extra server
uv run fleet web
uv run fastapi devFrontend build workflow (when validating packaged Web UI assets):
# from repo root
cd src/frontend
bun install --frozen-lockfile
bun run build
cd ../..Use the full contributor setup (frontend builds, env/bootstrap, quality gates) in AGENTS.md and CONTRIBUTING.md.
Read this after the quick start if you want the full system picture (entry points, ReAct orchestration, tools, Modal execution, persistent storage).
graph TB
subgraph entry ["πͺ Entry Points"]
CLI["CLI (Typer)"]
WebUI["Web UI<br/>(React SPA)"]
API["FastAPI<br/>(WS/REST)"]
TUI["Ink TUI<br/>(standalone runtime)"]
MCP["MCP Server"]
end
subgraph orchestration ["π§ Orchestration Layer"]
Agent["RLMReActChatAgent<br/>(dspy.Module)"]
History["Chat History"]
Memory["Core Memory<br/>(Persona/Human/Scratchpad)"]
DocCache["Document Cache"]
end
subgraph tools ["π§ ReAct Tools"]
DocTools["π load_document<br/>read_file_slice<br/>chunk_by_*"]
RecursiveTools["π rlm_query<br/>llm_query<br/>(recursive delegation)"]
ExecTools["β‘ execute_code<br/>edit_file<br/>search_code"]
end
subgraph execution ["βοΈ Execution Layer"]
Interpreter["ModalInterpreter<br/>(JSON protocol)"]
Profiles["Execution Profiles:<br/>ROOT | DELEGATE | MAINTENANCE"]
end
subgraph cloud ["βοΈ Modal Cloud"]
Sandbox["Sandbox Driver<br/>(Python REPL)"]
Volume[("πΎ Persistent Volume<br/>/data/<br/>β’ workspaces<br/>β’ artifacts<br/>β’ memory<br/>β’ session state")]
end
WebUI -->|"WS-first (REST compat)"| API
CLI --> Agent
API --> Agent
TUI --> Agent
MCP --> Agent
Agent --> History
Agent --> Memory
Agent --> DocCache
Agent --> DocTools
Agent --> RecursiveTools
Agent --> ExecTools
DocTools --> Interpreter
RecursiveTools --> Interpreter
ExecTools --> Interpreter
Interpreter --> Profiles
Interpreter -->|"stdin/stdout<br/>JSON commands"| Sandbox
Sandbox -->|"read/write"| Volume
style entry fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style orchestration fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style tools fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style execution fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style cloud fill:#fce4ec,stroke:#c2185b,stroke-width:2px
- Documentation index
- Explanation index
- Quick install + setup
- Configure Modal
- Runtime settings (LM/Modal diagnostics)
- Deploying the server
- Using the MCP server
- CLI reference
- HTTP API reference
- Auth modes
- Database architecture
- Source layout
fleet-rlm also supports runtime diagnostics endpoints, WebSocket execution streams (/api/v1/ws/execution), multi-tenant Neon-backed persistence, and opt-in PostHog LLM analytics. Those workflows are documented in the guides/reference docs rather than front-loaded here.
Contributions are welcome. Start with CONTRIBUTING.md, then use AGENTS.md for repo-specific commands and quality gates.
MIT License β see LICENSE.
Based on Recursive Language Modeling research by Alex L. Zhang (MIT CSAIL), Omar Khattab (Stanford), and Tim Kraska (MIT).