Intelligent Browser Automation with Local LLMs
Quick Start β’ Features β’ Examples β’ Documentation β’ API
curllm is a powerful CLI tool that combines browser automation with local LLMs (like Ollama's Qwen, Llama, Mistral) to intelligently extract data, fill forms, and automate web workflows - all running locally on your machine with complete privacy.
π v2 LLM-DSL Architecture! Dynamic element detection, semantic goal understanding, no hardcoded selectors. 388 tests passing.
# Extract products with prices from any e-commerce site
curllm "https://shop.example.com" -d "Find all products under $100"
# Fill contact forms automatically
curllm --stealth "https://example.com/contact" -d "Fill form: name=John, email=john@example.com"
# Extract all emails from a page
curllm "https://example.com" -d "extract all email addresses"| Feature | Description |
|---|---|
| π§ Local LLM | Works with 8GB GPUs (Qwen 2.5, Llama 3, Mistral) |
| π― Smart Extraction | LLM-guided DOM analysis - no hardcoded selectors |
| π Form Automation | Auto-fill forms with intelligent field mapping |
| π₯· Stealth Mode | Bypass anti-bot detection |
| ποΈ Visual Mode | See browser actions in real-time |
| π BQL Support | Browser Query Language for structured queries |
| π Export Formats | JSON, CSV, HTML, XLS output |
| π Privacy-First | Everything runs locally - no cloud APIs needed |
curllm v2 uses LLM-DSL (LLM Domain Specific Language) - a dynamic approach that eliminates hardcoded selectors:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LLM-DSL Flow β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 1. Goal Detection (semantic) β
β "Find RAM DDR5" β FIND_PRODUCTS β
β β
β 2. Strategy Selection β
β FIND_PRODUCTS β use search flow β
β FIND_CART β find link by semantic scoring β
β β
β 3. Element Finding (LLM-first) β
β LLM analysis β Statistical scoring β Fallback β
β β
β 4. Dynamic Selector Generation β
β Analyze DOM β Score elements β Generate selector β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Feature | Traditional | LLM-DSL |
|---|---|---|
| Selectors | Hardcoded CSS/XPath | Dynamic generation |
| Keywords | Static lists | Semantic analysis |
| Language | English only | Multi-language (PL, EN) |
| Maintenance | Manual updates | Self-adapting |
pip install -U curllm
curllm-setup # One-time setup (installs Playwright browsers)
curllm-doctor # Verify installation- Python 3.10+
- GPU: NVIDIA with 6-8GB VRAM (RTX 3060/4060) or CPU mode
- Ollama: For local LLM inference
# Install Ollama (if not installed)
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen2.5:7b# Extract all links
curllm "https://example.com" -d "extract all links"
# Extract emails
curllm "https://example.com/contact" -d "extract all email addresses"
# Output: {"emails": ["info@example.com", "sales@example.com"]}
# Extract products with price filter
curllm --stealth "https://shop.example.com" -d "Find all products under 500zΕ"# Fill contact form
curllm --visual --stealth "https://example.com/contact" \
-d "Fill form: name=John Doe, email=john@example.com, message=Hello"
# Login automation
curllm --visual "https://app.example.com/login" \
-d '{"instruction":"Login", "credentials":{"user":"admin", "pass":"secret"}}'# Export to CSV
curllm "https://example.com" -d "extract all products" --csv -o products.csv
# Export to HTML
curllm "https://example.com" -d "extract all links" --html -o links.html
# Export to Excel
curllm "https://example.com" -d "extract all data" --xls -o data.xlsx# Take screenshot
curllm "https://example.com" -d "screenshot"
# Visual mode (watch browser)
curllm --visual "https://example.com" -d "extract all links"curllm --bql -d 'query {
page(url: "https://news.ycombinator.com") {
title
links: select(css: "a.titlelink") { text url: attr(name: "href") }
}
}'curllm-web start # Start web UI at http://localhost:5000
curllm-web status # Check status
curllm-web stop # Stop serverFeatures:
- π¨ Modern responsive UI
- π 19 pre-configured prompts
- π Real-time log viewer
- π€ File upload support
Environment variables (.env):
CURLLM_MODEL=qwen2.5:7b # LLM model
CURLLM_OLLAMA_HOST=http://localhost:11434
CURLLM_HEADLESS=true # Run browser headlessly
CURLLM_STEALTH_MODE=false # Anti-detection
CURLLM_LOCALE=en-US # Browser localeβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β curllm CLI β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββ ββββββββββββββββββ βββββββββββββββββ β
β β DSL Executor βββββΆβ Knowledge Base βββββΆβ Strategy YAML β β
β β (Orchestrator)β β (SQLite) β β Files β β
β ββββββββββββββββββ ββββββββββββββββββ βββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β DOM Toolkit (Pure JS) β β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββ β β
β β βStructure β β Patterns β βSelectors β β Prices β β β
β β β Analyzer β β Detector β βGenerator β β Detector β β β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Playwright Browser Engine β β
β β (Chromium with Stealth & Anti-Detection) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Ollama / LiteLLM β β
β β (Local LLM: Qwen 2.5, Llama 3, Mistral, GPT, etc) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Component | Description | LLM Calls |
|---|---|---|
| URL Resolver | Smart navigation with goal detection | 0-1 |
| Goal Detector | Semantic intent understanding | 0-1 |
| Element Finder | Dynamic selector generation | 0-1 |
| DOM Toolkit | Pure JavaScript atomic queries | 0 |
| SPA Hydration | Wait for CSR/SPA content | 0 |
π Full Architecture Documentation β
Note: The YAML DSL system works alongside the newer LLM-DSL. YAML strategies are used for known sites with proven extraction patterns, while LLM-DSL handles unknown sites dynamically.
curllm automatically learns and saves successful extraction strategies as YAML files:
# dsl/ceneo_products.yaml - Auto-generated from successful extraction
url_pattern: "*.ceneo.pl/*"
task: extract_products
algorithm: statistical_containers
selector: div.product-card
fields:
name: h3.title
price: span.price
url: a[href]
metadata:
success_rate: 0.95
use_count: 42- First visit - LLM-DSL dynamically analyzes page, extracts data
- Successful - Strategy saved to
dsl/*.yaml, recorded in Knowledge Base - Next visit - Knowledge Base loads saved strategy (fast path)
- Unknown site - Falls back to LLM-DSL dynamic discovery
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Request Flow β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β URL β Knowledge Base lookup β
β β β
β ββ Found? β Load YAML strategy (fast) β
β β β
β ββ Not found? β LLM-DSL dynamic (flexible) β
β β β
β ββ Success? β Save to YAML β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Algorithm | Best For | Speed |
|---|---|---|
statistical_containers |
Product grids | β‘ Fast |
pattern_detection |
Lists, tables | β‘ Fast |
llm_guided |
Complex layouts | π’ Slower |
form_fill |
Contact forms | β‘ Fast |
π DSL System Documentation β
curllm supports multiple LLM providers via LiteLLM:
from curllm_core import LLMConfig
# OpenAI
config = LLMConfig(provider="openai/gpt-4o-mini")
# Anthropic
config = LLMConfig(provider="anthropic/claude-3-haiku-20240307")
# Google Gemini
config = LLMConfig(provider="gemini/gemini-2.0-flash")
# Local Ollama (default)
config = LLMConfig(provider="ollama/qwen2.5:7b")- ποΈ System Architecture
- 𧬠DSL System - Strategy-based extraction
- βοΈ DOM Toolkit - Pure JS queries
- π§© Components - Module overview
- π LLM-DSL URL Resolution - Smart URL navigation
# Clone and install
git clone https://github.com/wronai/curllm.git
cd curllm
make install
# Run tests (388 tests passing)
make test
# Run URL resolver examples
cd examples/url_resolver && python run_all.py
# Run with Docker
docker compose up -dApache License 2.0 - see LICENSE
Built with:
- Playwright - Browser automation
- Ollama - Local LLM inference
- LiteLLM - Multi-provider LLM support
- Flask - Web framework
β Star this repo if you find it useful!
Made with β€οΈ by wronai
