Note
🤖 AI-Aided Development (AIAD)
This project openly uses AI-assisted development (e.g. Claude Code) to accelerate workflows, improve code quality, and gain more development momentum. All AI-generated code is reviewed and approved by humans — this is not a vibe-coding project, but a deliberate effort to build a useful product while exploring the boundaries, benefits, and trade-offs of AI-aided development.
Scrape Dojo is a self-hosted web scraping & browser automation platform. Instead of writing Puppeteer code for every site, you define workflows declaratively in JSON/JSONC — like Infrastructure-as-Code, but for scraping.
Key capabilities:
- ⚡ 25+ built-in actions — navigate, click, type, extract, loop, download, screenshot, and more
- 🧩 Handlebars + JSONata — dynamic templates and powerful data transformations
- ⏰ Cron scheduling — automate scrapes with cron, webhooks, or startup triggers
- 🔐 Encrypted secrets — AES-256-CBC at-rest encryption for credentials
- 📡 Real-time monitoring — SSE-powered live execution tracking in Angular UI
- 🛡️ Auth (optional) — JWT, OIDC/SSO, MFA/TOTP, API keys
- 🗄️ Multi-DB — SQLite (default), MySQL, PostgreSQL
Important
Scrape Dojo automates real browser interactions. Please respect website terms of service and applicable legal frameworks.
Full documentation: scrape-dojo.com
# 1. Generate encryption key
node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"
# 2. Create docker-compose.yml
cat <<'EOF' > docker-compose.yml
services:
scrape-dojo:
image: ghcr.io/disane87/scrape-dojo:latest
ports:
- '8080:80'
environment:
- SCRAPE_DOJO_ENCRYPTION_KEY=your_generated_key_here
- SCRAPE_DOJO_AUTH_JWT_SECRET=your_random_jwt_secret_here
- SCRAPE_DOJO_AUTH_REFRESH_TOKEN_SECRET=your_random_refresh_secret_here
- DB_TYPE=sqlite
# - SCRAPE_DOJO_PROXY_URL=http://proxy:8080 # Optional: route scrapes through a proxy
volumes:
- ./data:/home/pptruser/app/data
- ./downloads:/home/pptruser/app/downloads
- ./logs:/home/pptruser/app/logs
- ./config:/home/pptruser/app/config
- ./browser-data:/home/pptruser/app/browser-data
restart: unless-stopped
EOF
# 3. Start
docker compose up -dOpen http://localhost:8080 — UI and API on the same port.
Warning
The SCRAPE_DOJO_ENCRYPTION_KEY encrypts all secrets. Store it safely — if lost, existing secrets are unrecoverable.
For local development, environment variables, auth setup, and more: see the Quickstart Guide.
Create config/sites/my-first-scrape.jsonc:
The scrape auto-appears in the UI (hot reload). Click Run or use the API:
curl http://localhost:8080/api/scrape/my-first-scrapeEverything else lives in the docs:
| Topic | Link |
|---|---|
| 🚀 Quickstart (Docker & Source) | Getting Started |
| 📐 Config format & metadata | Configuration |
| ⚡ All 22 actions with examples | Actions Reference |
| 🧩 Templates & JSONata | Templates |
| ⏰ Scheduling & triggers | Scheduling |
| 🔐 Secrets & variables | Secrets & Variables |
| ⚙️ Environment variables | Env Reference |
| 🏗️ Architecture & API | Developer Guide |
| 🛡️ Auth (JWT/OIDC/MFA) | Authentication |
| 💡 Full examples | Examples |
git clone https://github.com/disane87/scrape-dojo.git && cd scrape-dojo
pnpm install
cp .env.example .env # Set SCRAPE_DOJO_ENCRYPTION_KEY
pnpm start # API (3000) + UI (4200)
pnpm test # All tests| Command | What it does |
|---|---|
pnpm start |
API + UI dev servers |
pnpm test |
All tests |
pnpm test:api |
API tests only |
pnpm test:ui |
UI tests only |
pnpm lint |
Lint all projects |
pnpm build |
Build all apps |
Commits follow Conventional Commits (feat:, fix:, docs:, etc.).
- 🐛 Issues & bugs: GitHub Issues
- 💡 Feature requests: New Issue
- 🔀 Pull requests: Fork → branch → commit → PR
MIT — use it however you like.
Made with ❤️ by Marco Franke

{ "$schema": "../scrapes.schema.json", "scrapes": [ { "id": "my-first-scrape", "metadata": { "description": "Read a page title", "triggers": [{ "type": "manual" }], }, "steps": [ { "name": "Main", "actions": [ { "name": "open", "action": "navigate", "params": { "url": "https://example.com" }, }, { "name": "title", "action": "extract", "params": { "selector": "h1" }, }, { "name": "log", "action": "logger", "params": { "message": "Title: {{previousData.title}}" }, }, ], }, ], }, ], }