MinerU Parsing Service

📖 Deployment Guide • ⚙️ Configuration • 💡 API Examples • 🔧 Troubleshooting

Enterprise-grade document parsing service with asynchronous queue processing based on Celery, featuring a fully decoupled API/Worker architecture.

Features

🚀 Asynchronous Processing: Distributed task queue based on Celery
📄 Multi-format Support: PDF, Office, images, and various document formats
🔄 High Availability: Supports task retry and fault recovery
📊 Real-time Monitoring: Task status tracking and queue statistics
🎯 Priority Queue: Supports task priority scheduling
🔧 Easy to Extend: Modular design, easy to add new parsing engines

Quick Start

Prerequisites

Docker and Docker Compose
(Optional) NVIDIA GPU for GPU worker

Simplest Way (Recommended)

4 steps to start:

Copy configuration files:

# Project root
cp .env.example .env
cd docker && cp .env.example .env

Configure service selection (in docker/.env):

cd docker
# Edit .env file, set COMPOSE_PROFILES (choose one)

# Option 1: GPU Worker + internal Redis (default, requires NVIDIA GPU)
COMPOSE_PROFILES=redis,mineru-gpu

# Option 2: CPU Worker + internal Redis (recommended for development)
# COMPOSE_PROFILES=redis,mineru-cpu

💡 Notes:

Default: COMPOSE_PROFILES=redis,mineru-gpu (GPU Worker)

Control which services start via COMPOSE_PROFILES (Redis and Worker)

API and Cleanup services start automatically (no profile, required services)

Build images:

cd docker
# Simplest: run directly (automatically selects CPU or GPU Worker based on COMPOSE_PROFILES)
sh build.sh

# Or manually specify (build.sh supports parameters to build only needed services)
# GPU Worker:
sh build.sh --api --worker-gpu
# CPU Worker:
sh build.sh --api --worker-cpu

Start services:

cd docker
# Simplest: start directly (automatically starts configured services based on COMPOSE_PROFILES)
docker compose up -d

# Or manually specify (equivalent ways)
# GPU Worker:
docker compose --profile redis --profile mineru-gpu up -d
# CPU Worker:
docker compose --profile redis --profile mineru-cpu up -d

Verify services:

curl http://localhost:8000/api/v1/health

That's it! The API is now running at http://localhost:8000.

💡 Tips:

After configuring COMPOSE_PROFILES, both sh build.sh and docker compose up -d will automatically recognize it

sh build.sh without parameters automatically selects CPU or GPU Worker based on COMPOSE_PROFILES

You can also use parameters to explicitly specify: sh build.sh --api --worker-gpu or sh build.sh --api --worker-cpu

See docker/README.md for more configuration options

API Usage

MinerU-API provides two API interfaces to suit different use cases:

1. Official MinerU API (Synchronous)

The /file_parse endpoint is compatible with the official MinerU API format. It submits tasks to the worker and waits for completion, returning results directly in the response.

Reference: MinerU Official API

curl -X POST "http://localhost:8000/file_parse" \
  -F "files=@document.pdf" \
  -F "backend=pipeline" \
  -F "lang_list=ch" \
  -F "parse_method=auto" \
  -F "return_md=true"

Use cases: Simple integration, immediate results needed, compatible with existing MinerU clients.

2. Async Queue API (Asynchronous)

The /api/v1/tasks/submit and /api/v1/tasks/{task_id} endpoints provide an asynchronous queue-based API, compatible with the mineru-tianshu project format.

Reference: mineru-tianshu API

Submit a Task:

curl -X POST "http://localhost:8000/api/v1/tasks/submit" \
  -F "file=@document.pdf" \
  -F "backend=pipeline" \
  -F "lang=ch"

Query Task Status:

curl "http://localhost:8000/api/v1/tasks/{task_id}"

Use cases: Production deployments, batch processing, long-running tasks, better scalability.

View API Documentation

Visit http://localhost:8000/docs for interactive API documentation with full parameter details.

Basic Configuration

Environment Variables

The most important configuration options (see .env.example for all options):

# Redis Configuration
REDIS_URL=redis://redis:6379/0

# Storage Type: local or s3
MINERU_STORAGE_TYPE=local

# For S3 storage (distributed deployment)
MINERU_S3_ENDPOINT=http://minio:9000
MINERU_S3_ACCESS_KEY=minioadmin
MINERU_S3_SECRET_KEY=minioadmin

# CORS Configuration (production)
CORS_ALLOWED_ORIGINS=http://localhost:3000
ENVIRONMENT=production

# File Upload Limits
MAX_FILE_SIZE=104857600  # 100MB

Documentation

📖 Full Documentation - Complete guide and configuration (English | 中文)
🚀 Deployment Guide - Production deployment (中文)
⚙️ Configuration Reference - All configuration options (中文)
💡 API Examples - Code examples in multiple languages (中文)
🔧 Troubleshooting - Common issues and solutions (中文)
🧹 Storage & Cleanup - Storage configuration and cleanup (中文)

Architecture

API Service: Handles task submission and status queries (api/app.py)
Worker Service: Processes documents using MinerU/MarkItDown (worker/tasks.py)
Redis: Message queue and result storage
Shared Config: Unified configuration in shared/celeryconfig.py

Development

Setting Up Development Environment

For detailed development environment setup instructions, see docs/DEVELOPMENT.md.

Quick Start:

# Use the automated setup script (recommended)
chmod +x setup_venv.sh
./setup_venv.sh

# Or manually:
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install --upgrade pip setuptools wheel
pip install -r api/requirements.txt
pip install -r worker/requirements.txt
pip install -r cleanup/requirements.txt

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Acknowledgments

This project is built on top of the following excellent open-source projects:

MinerU - The core document parsing engine that powers this service
mineru-tianshu - Inspiration and reference for the API architecture

We are grateful to the developers and contributors of these projects for their valuable work.

License

MIT License - see LICENSE file for details.

Third-Party Licenses

This project uses the following open-source libraries:

MinerU - Licensed under AGPL-3.0
MarkItDown - Licensed under MIT

MinerU is used as an external library and its source code is not included in this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MinerU Parsing Service

Features

Quick Start

Prerequisites

Simplest Way (Recommended)

API Usage

1. Official MinerU API (Synchronous)

2. Async Queue API (Asynchronous)

View API Documentation

Basic Configuration

Environment Variables

Documentation

Architecture

Development

Setting Up Development Environment

Contributing

Acknowledgments

License

Third-Party Licenses

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
api		api
cleanup		cleanup
docker		docker
docs		docs
examples		examples
shared		shared
worker		worker
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
setup_venv.sh		setup_venv.sh

Folders and files

Latest commit

History

Repository files navigation

MinerU Parsing Service

Features

Quick Start

Prerequisites

Simplest Way (Recommended)

API Usage

1. Official MinerU API (Synchronous)

2. Async Queue API (Asynchronous)

View API Documentation

Basic Configuration

Environment Variables

Documentation

Architecture

Development

Setting Up Development Environment

Contributing

Acknowledgments

License

Third-Party Licenses

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages