Scrape Dojo

Declarative web scraping & browser automation with JSON workflows

Note

🤖 AI-Aided Development (AIAD)

This project openly uses AI-assisted development (e.g. Claude Code) to accelerate workflows, improve code quality, and gain more development momentum. All AI-generated code is reviewed and approved by humans — this is not a vibe-coding project, but a deliberate effort to build a useful product while exploring the boundaries, benefits, and trade-offs of AI-aided development.

🥷 What is Scrape Dojo?

Scrape Dojo is a self-hosted web scraping & browser automation platform. Instead of writing Puppeteer code for every site, you define workflows declaratively in JSON/JSONC — like Infrastructure-as-Code, but for scraping.

Key capabilities:

⚡ 25+ built-in actions — navigate, click, type, extract, loop, download, screenshot, and more
🧩 Handlebars + JSONata — dynamic templates and powerful data transformations
⏰ Cron scheduling — automate scrapes with cron, webhooks, or startup triggers
🔐 Encrypted secrets — AES-256-CBC at-rest encryption for credentials
📡 Real-time monitoring — SSE-powered live execution tracking in Angular UI
🛡️ Auth (optional) — JWT, OIDC/SSO, MFA/TOTP, API keys
🗄️ Multi-DB — SQLite (default), MySQL, PostgreSQL

Important

Scrape Dojo automates real browser interactions. Please respect website terms of service and applicable legal frameworks.

Full documentation: scrape-dojo.com

🐳 Quick Start (Docker)

# 1. Generate encryption key
node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"

# 2. Create docker-compose.yml
cat <<'EOF' > docker-compose.yml
services:
  scrape-dojo:
    image: ghcr.io/disane87/scrape-dojo:latest
    ports:
      - '8080:80'
    environment:
      - SCRAPE_DOJO_ENCRYPTION_KEY=your_generated_key_here
      - SCRAPE_DOJO_AUTH_JWT_SECRET=your_random_jwt_secret_here
      - SCRAPE_DOJO_AUTH_REFRESH_TOKEN_SECRET=your_random_refresh_secret_here
      - DB_TYPE=sqlite
      # - SCRAPE_DOJO_PROXY_URL=http://proxy:8080  # Optional: route scrapes through a proxy
    volumes:
      - ./data:/home/pptruser/app/data
      - ./downloads:/home/pptruser/app/downloads
      - ./logs:/home/pptruser/app/logs
      - ./config:/home/pptruser/app/config
      - ./browser-data:/home/pptruser/app/browser-data
    restart: unless-stopped
EOF

# 3. Start
docker compose up -d

Open http://localhost:8080 — UI and API on the same port.

Warning

The SCRAPE_DOJO_ENCRYPTION_KEY encrypts all secrets. Store it safely — if lost, existing secrets are unrecoverable.

For local development, environment variables, auth setup, and more: see the Quickstart Guide.

⚡ Your First Scrape

Create config/sites/my-first-scrape.jsonc:

{
  "$schema": "../scrapes.schema.json",
  "scrapes": [
    {
      "id": "my-first-scrape",
      "metadata": {
        "description": "Read a page title",
        "triggers": [{ "type": "manual" }],
      },
      "steps": [
        {
          "name": "Main",
          "actions": [
            {
              "name": "open",
              "action": "navigate",
              "params": { "url": "https://example.com" },
            },
            {
              "name": "title",
              "action": "extract",
              "params": { "selector": "h1" },
            },
            {
              "name": "log",
              "action": "logger",
              "params": { "message": "Title: {{previousData.title}}" },
            },
          ],
        },
      ],
    },
  ],
}

The scrape auto-appears in the UI (hot reload). Click Run or use the API:

curl http://localhost:8080/api/scrape/my-first-scrape

📖 Documentation

Everything else lives in the docs:

Topic	Link
🚀 Quickstart (Docker & Source)	Getting Started
📐 Config format & metadata	Configuration
⚡ All 22 actions with examples	Actions Reference
🧩 Templates & JSONata	Templates
⏰ Scheduling & triggers	Scheduling
🔐 Secrets & variables	Secrets & Variables
⚙️ Environment variables	Env Reference
🏗️ Architecture & API	Developer Guide
🛡️ Auth (JWT/OIDC/MFA)	Authentication
💡 Full examples	Examples

🛠️ Development

git clone https://github.com/disane87/scrape-dojo.git && cd scrape-dojo
pnpm install
cp .env.example .env  # Set SCRAPE_DOJO_ENCRYPTION_KEY
pnpm start            # API (3000) + UI (4200)
pnpm test             # All tests

Command	What it does
`pnpm start`	API + UI dev servers
`pnpm test`	All tests
`pnpm test:api`	API tests only
`pnpm test:ui`	UI tests only
`pnpm lint`	Lint all projects
`pnpm build`	Build all apps

Commits follow Conventional Commits (feat:, fix:, docs:, etc.).

🤝 Contributing

🐛 Issues & bugs: GitHub Issues
💡 Feature requests: New Issue
🔀 Pull requests: Fork → branch → commit → PR

📄 License

MIT — use it however you like.

🌟 Contributors

Made with ❤️ by Marco Franke

Documentation · Issues · Discussions

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
.claude		.claude
.github		.github
.husky		.husky
.vscode		.vscode
apps		apps
config		config
docker		docker
libs/shared		libs/shared
scripts		scripts
test		test
.dockerignore		.dockerignore
.env.example		.env.example
.eslintrc.js		.eslintrc.js
.gitattributes		.gitattributes
.gitignore		.gitignore
.postcssrc.json		.postcssrc.json
.prettierignore		.prettierignore
.prettierrc		.prettierrc
.swcrc		.swcrc
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
PROMPT.md		PROMPT.md
README.md		README.md
SECURITY.md		SECURITY.md
commitlint.config.js		commitlint.config.js
docker-compose.yml		docker-compose.yml
jest.config.ts		jest.config.ts
jest.preset.js		jest.preset.js
nest-cli.json		nest-cli.json
nx.json		nx.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
release.config.js		release.config.js
tsconfig.base.json		tsconfig.base.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scrape Dojo

🥷 What is Scrape Dojo?

🐳 Quick Start (Docker)

⚡ Your First Scrape

📖 Documentation

🛠️ Development

🤝 Contributing

📄 License

🌟 Contributors

About

Uh oh!

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Scrape Dojo

🥷 What is Scrape Dojo?

🐳 Quick Start (Docker)

⚡ Your First Scrape

📖 Documentation

🛠️ Development

🤝 Contributing

📄 License

🌟 Contributors

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages