This project is my exploration of coding agents and Recursive Language Models, built in Elixir because OTP's supervision trees and process model felt like a natural fit for managing recursive LLM spawning. I was inspired by a few things: the RLM paper and its idea of LLMs writing code in a loop, Jbollenbacher's Elixir RLM which I wanted to take further, and the design philosophy behind pi — a coding agent that keeps things simple and transparent. This is very much a learning project, but it works and it's been fun to build.
A single Phoenix application: an AI execution engine where Claude writes Elixir code that runs in a persistent REPL, with recursive sub-agent spawning and built-in filesystem tools.
One engine, two modes:
- One-shot —
RLM.run/3processes data and returns a result - Interactive —
RLM.start_session/1+send_message/3for multi-turn sessions with persistent bindings
Each RLM.run/3 call launches a Worker that runs an iterate loop: send the
conversation history to Claude, receive back {reasoning, code}, evaluate the code in a
sandboxed Elixir REPL, feed the output back, and repeat until the code sets a
final_answer variable.
The eval step runs in a separate lightweight process so the Worker stays responsive.
Code inside the REPL can call lm_query/2 to hand off a sub-task to a child Worker —
that call goes to the Worker as a message, which it handles while eval waits for the
result. Without the async pattern this would deadlock.
flowchart TD
A(["RLM.run(context, query)"]) --> W
subgraph W["Worker loop"]
direction TB
LLM["Claude API reasoning + code"] --> EVAL
EVAL["async eval Code.eval_string"] -- "final_answer set" --> OUT(["return {:ok, answer, run_id}"])
EVAL -- "no final_answer" --> LLM
end
EVAL -. "lm_query() → spawns child Worker" .-> CHILD["child Worker (flat sibling, same DynSup)"]
CHILD -. "result" .-> EVAL
Three invariants the engine enforces:
- Raw input data never enters the LLM context window — only metadata or previews
- Sub-LLM outputs stay in variables, never surfaced directly to the parent LLM
- Stdout is truncated head+tail to prevent context overflow
Requires Elixir ≥ 1.19 / OTP 27 and an Anthropic API key.
export CLAUDE_API_KEY=sk-ant-...
mix deps.get && mix compile
mix test # excludes live API tests
iex -S mix # interactive shell# Synchronous — returns {:ok, answer, run_id}
# run_id lets you retrieve the execution trace afterwards
{:ok, answer, run_id} = RLM.run(
"line 1\nline 2\nline 3\nline 4",
"Count the lines and return the count as an integer"
)
# => {:ok, 4, "run-abc123"}
# Async — returns immediately; result delivered as {:rlm_result, span_id, result}
{:ok, run_id, pid} = RLM.run_async(my_large_text, "Summarize the key themes")
# Retrieve the execution trace
tree = RLM.EventLog.get_tree(run_id)Bindings persist across turns. The Worker stays alive between messages.
{:ok, sid} = RLM.start_session(cwd: ".")
{:ok, answer1} = RLM.send_message(sid, "List files in the current directory using bash")
{:ok, answer2} = RLM.send_message(sid, "Now read the README")
RLM.history(sid) # full message history
RLM.status(sid) # session status mapimport RLM.IEx
{session, _} = start_chat("What Elixir version is this project using?")
chat(session, "Now show me the supervision tree")
watch(session) # attach a live telemetry stream{:ok, result, run_id} = RLM.run(context, query,
max_iterations: 10,
max_depth: 3,
model_large: "claude-opus-4-6",
eval_timeout: 60_000
)Functions available inside the LLM's eval'd code:
# Data helpers
chunks(context, 1000) # stream of 1000-byte chunks
grep("pattern", ctx) # [{line_num, line}, ...]
preview(term, 200) # truncated inspect
list_bindings() # current bindings info
# Recursive LLM calls
{:ok, result} = lm_query("summarise this chunk", model_size: :small)
results = parallel_query(["chunk1", "chunk2"], model_size: :small)
# Structured extraction — single direct LLM call, returns a parsed map
{:ok, %{"names" => names}} = lm_query("Extract names from: #{text}",
schema: %{
"type" => "object",
"properties" => %{"names" => %{"type" => "array", "items" => %{"type" => "string"}}},
"required" => ["names"]
}
)
# Filesystem tools
{:ok, content} = read_file("path/to/file.ex")
{:ok, _} = write_file("output.txt", "content")
{:ok, _} = edit_file("file.ex", "old text", "new text")
{:ok, output} = bash("mix test")
{:ok, matches} = rg("defmodule", "lib/")
{:ok, files} = find_files("**/*.ex")
{:ok, listing} = ls()New to OTP? A Supervisor is a process whose only job is to start and monitor other processes; a DynamicSupervisor spawns children on demand at runtime. A GenServer is a long-lived process with a message inbox — Elixir's version of an actor.
graph LR
SUP["RLM.Supervisor · one_for_one"]
SUP --> REG["RLM.Registry · process lookup"]
SUP --> PS["Phoenix.PubSub · event broadcasting"]
SUP --> TS["RLM.TaskSupervisor · bash tool tasks"]
SUP --> RUNSUP["RLM.RunSup · DynamicSupervisor"]
subgraph Obs["Observability"]
direction TB
ES["RLM.EventStore · EventLog Agents"]
TEL["RLM.Telemetry · handler attachment"]
TRACE["RLM.TraceStore · :dets"]
SWEEP["EventLog.Sweeper · periodic GC"]
end
SUP --> ES
SUP --> TEL
SUP --> TRACE
SUP --> SWEEP
subgraph Web["Web Layer"]
direction TB
WEBTEL["RLMWeb.Telemetry · metrics"]
DNS["DNSCluster · cluster discovery"]
EP["RLMWeb.Endpoint · Phoenix / LiveView"]
end
SUP --> WEBTEL
SUP --> DNS
SUP --> EP
RUNSUP --> RUN["RLM.Run · :temporary"]
subgraph RunScope["Per-run scope — owned and linked by RLM.Run"]
direction TB
WD["DynamicSupervisor · worker pool"]
EVALSUP["Task.Supervisor · eval tasks"]
WD --> W1["RLM.Worker · :temporary"]
WD --> W2["RLM.Worker · :temporary"]
EVALSUP --> ET["eval Task · per iteration"]
end
RUN --> WD
RUN --> EVALSUP
ETS[/"ETS · span_id, parent, depth, status · not OTP supervision"/]
RUN -. "tracks worker relationships" .-> ETS
style SUP fill:#ede9fe,stroke:#7c3aed,color:#1a1e27
style RUNSUP fill:#dbeafe,stroke:#1d4ed8,color:#1a1e27
style RUN fill:#dcfce7,stroke:#15803d,color:#1a1e27
style WD fill:#fef3c7,stroke:#b45309,color:#1a1e27
style EVALSUP fill:#fef3c7,stroke:#b45309,color:#1a1e27
style ETS fill:#f5f0e0,stroke:#c8b87a,color:#6b5e3a
All Workers at all recursion depths are flat siblings under the run's single
DynamicSupervisor. The parent-child depth relationship is recorded in an ETS table
(Erlang's built-in in-memory key-value store) owned by RLM.Run, not encoded in OTP
supervision. Workers use restart: :temporary — they exit after completing their task
and are never restarted.
Enforced at compile time via boundary:
| Layer | Module | Rule |
|---|---|---|
| Core engine | RLM |
Zero web dependencies |
| Web dashboard | RLMWeb |
Depends only on RLM |
| Application | RLM.Application |
Starts the unified supervision tree |
See docs/GUIDE.html for the full architecture reference — module map,
async-eval pattern walkthrough, telemetry event catalogue, and all configuration fields.
{:ok, _result, run_id} = RLM.run(context, query)
tree = RLM.EventLog.get_tree(run_id) # recursive span tree
jsonl = RLM.EventLog.to_jsonl(run_id)
File.write!("trace.jsonl", jsonl)The LiveView dashboard at http://localhost:4000 shows the same span tree live with
expandable iteration cards for every run in the current session.
17 events fire during execution. Attach your own handler:
:telemetry.attach("my-handler", [:rlm, :iteration, :stop],
fn _event, measurements, meta, _ ->
IO.puts("Iteration #{meta.iteration} — #{measurements.duration_ms}ms")
end, nil)Event families: [:rlm, :node, :*], [:rlm, :iteration, :*],
[:rlm, :llm, :request, :*], [:rlm, :eval, :*],
[:rlm, :subcall, :*], [:rlm, :direct_query, :*]
Phoenix.PubSub.subscribe(RLM.PubSub, "rlm:runs")
Phoenix.PubSub.subscribe(RLM.PubSub, "rlm:run:#{run_id}")Connect multiple RLM nodes for remote execution:
RLM.Node.start()
# => {:ok, :rlm@hostname}
RLM.Node.rpc(:"rlm@other_host", Kernel, :+, [1, 2])
# => {:ok, 3}
import RLM.IEx
remote(:"rlm@other_host", "Summarize the key themes", context: my_large_text)Configure for releases:
RLM_NODE_NAME=rlm # defaults to release name
RLM_COOKIE=secret # shared secret for node authenticationRLM executes LLM-generated Elixir code via Code.eval_string with full access to the
host filesystem, network, and shell. Do not expose RLM to untrusted users or untrusted
LLM providers. It is designed for local development, trusted API backends (Anthropic),
and controlled environments. There is no sandboxing beyond process-level isolation and
configurable timeouts.
This project is licensed under the MIT License.