Toad Research 🐸 AI 😉 Character Recognition
OCR orchestration for cloud APIs and local vLLM models
Install · Run · Screenshots · Web Reviewer + ELO · Contributor Notes
Cloud-ready OCR orchestration service with:
- OpenAI-compatible routing (
base_url+ API key) - Local OCR model execution via vLLM (optional dependency)
- Textual/Rich TUI for launching and monitoring jobs
- Structured outputs + metadata for multi-model runs
- Built-in web reviewer + ELO comparison workflow
TRACR uses a unified OpenAI-compatible client for provider swaps, optional local vLLM runtime management, and parallel model-run orchestration.
Contributor architecture and endpoint notes: AGENTS.md.
First, clone the repo and enter it.
uv syncFor local vLLM mode:
uv sync --extra localNote: local vLLM dependencies currently require Python 3.11-3.13 (not 3.14 yet due upstream ray wheel availability).
For web reviewer + ELO markdown rendering:
uv sync --extra webInstall both optional stacks:
uv sync --extra local --extra webCopy environment template:
cp .example.env .envRun API + TUI together (single-command local operator mode):
uv run tracrThis starts the API in the background and opens the TUI in the foreground.
Or run them separately (recommended for production and remote setups):
Start API server:
uv run tracr apiStart TUI (in another terminal):
uv run tracr tuiStart web reviewer + ELO arena:
uv run tracr web
# or
uv run webUse --no-open to skip auto-opening a browser tab.
Launch flow is keyboard-first and multi-step (wizard pages), with no mouse required.
Set Job id in the wizard to use that exact outputs/<job_id> folder (no timestamp suffix).
Key launch shortcuts:
Ctrl+N: next stepCtrl+B: previous stepCtrl+A: add current API/local model selection to queued modelsCtrl+L: launch jobF5: check API key status on API pageEsc: cancel wizard
Home view shortcuts:
EnterorM: open selected job monitor (per-model progress + GPU)V: open selected job output pages viewerO: open outputs browser (metadata + markdown preview)D: dismiss selected completed/canceled job from home viewLeft/Right: switch completed pages in job viewerUp/Down: scroll current page in job viewerR: refresh page list in viewer
Standalone vLLM launch:
uv run tracr vllm-launch lightonai/LightOnOCR-2-1B --num-gpus 1 --data-parallel-size 1 --port 9000Run all tests:
uv run tracr testPass through pytest args:
uv run tracr test -- -k output_layoutPut PDFs (or nested folders) under:
inputs/
TUI lets you:
- select files/folders from
inputs/ - or type any absolute/relative path manually
Job configuration YAML files can be stored under:
job_configs/
In the wizard Input step you can:
- select a YAML config from
job_configs/ - or type any YAML path manually
- load it to prefill input/job/prompt/model settings and model queue
In launch flow, choose provider preset (OpenAI/OpenRouter/Gemini) or custom endpoint. Preset selection also shows example model IDs for each provider.
The TUI checks .env key presence (by configured env var). If missing, enter an inline key in the form.
Use Add API Model(s) to queue multiple API models/providers in one job.
Model queue meaning:
- queue = launch list of model configurations captured in wizard
- execution is concurrent per model at runtime (not sequential), except local runs may wait for free GPUs
Default OCR model list:
lightonai/LightOnOCR-2-1Bzai-org/GLM-OCRPaddlePaddle/PaddleOCR-VL-1.5allenai/olmOCR-2-7B-1025datalab-to/chandra
zai-org/GLM-OCR note (per model docs): if startup fails with glm_ocr architecture errors,
upgrade local runtime stack:
uv pip install -U --pre vllm --extra-index-url https://wheels.vllm.ai/nightly
uv pip install -U git+https://github.com/huggingface/transformers.gitYou can add any custom Hugging Face model id (org/model).
Use Add Local Model(s) to queue local models without leaving the wizard.
GPU-aware local scheduling is built in:
- local runs wait when insufficient GPUs are available
- vLLM servers are started on-demand
- vLLM launch uses OLMoCR-style serve flags (
tp,dp, bounded multimodal prompt config, quieter request logs) - page OCR is executed with bounded in-flight request batching (
max_concurrent_requests) - resources are released when runs end
Outputs are stored as:
outputs/<job_id>/<org-name-model-name>/run-<run_num>/<pdf_slug>/<page_num>.md
Examples:
outputs/invoices-20260206-120005/lightonai-LightOnOCR-2-1B/run-1/doc-a/1.md
outputs/invoices-20260206-120005/lightonai-LightOnOCR-2-1B/run-1/doc-a/2.md
Metadata files are written at each level:
outputs/<job_id>/job_metadata.jsonoutputs/<job_id>/<model_slug>/model_metadata.jsonoutputs/<job_id>/<model_slug>/run-<run_num>/run_metadata.jsonoutputs/<job_id>/<model_slug>/run-<run_num>/<pdf_slug>/pdf_metadata.json
Metadata now includes processing stats:
job_metadata.json: global rollups (time + token usage across all runs/pages)run_metadata.json: per-model-run rollupspdf_metadata.json: per-PDF rollups plus page-level entries (pages[]) with per-page timing, token usage, status, and errors- Markdown viewers in TUI show per-page output token count and character count when available.
If the same job_id is reused, new model runs increment run_num and append under that job.
TRACR includes a dedicated web interface at /web:
- Output Viewer: choose a job/output, inspect original PDF page side-by-side with extracted markdown
- Markdown mode toggle: switch between rendered markdown and raw markdown text
- ELO Arena mode: blind compare model outputs as
Model AvsModel B(sides may swap between pairs) - ELO Browse mode: lock a model pair and step across shared pages for targeted comparisons
ELO vote controls:
1:Left2:Right3:Tie4:Both BadS:SkipN:Next Pair
Other ELO controls:
R: toggle raw/rendered markdownB: toggle Arena/Browse modeLeft/Right Arrow: move pages in Browse mode
ELO artifacts are saved under:
outputs/<job_id>/elo/
including rating state (ratings.json) and vote history (votes.jsonl).
Contributor-facing architecture notes and the full endpoint inventory are maintained in AGENTS.md.


