Enterprise-Grade GPU-Accelerated OCR Service
Extract text from documents with state-of-the-art AI models
Quick Start β’ Features β’ Documentation β’ Deployment β’ API
- π GPU-Accelerated Processing - Powered by NVIDIA CUDA for blazing-fast OCR
- π― State-of-the-Art Models - Uses Nanonets OCR-s for superior accuracy
- π Large File Support - Process up to 500MB PDFs with HTTP/2 streaming (uploads work during model loading)
- β‘ Live Results Preview - See extracted text in real-time as pages complete
- π Incremental Processing - No waiting - results appear as they're ready
- βοΈ Cloud Native - Deploy to Google Cloud Run with auto-scaling
- π Enterprise Security - User isolation, session management, and audit trails
- π Real-Time Progress - Live processing updates and detailed analytics
- π Modern Web UI - Beautiful, responsive interface with dark mode
- π REST API - Full programmatic access with OpenAPI docs
-
NVIDIA GPU with 12GB+ VRAM (RTX 3060 12GB/3070/3080/3090 or better)
-
Docker Desktop with GPU support
-
16GB+ RAM (32GB recommended for large documents)
# Clone the repository
git clone https://github.com/kordless/gnosis-ocr.git
cd gnosis-ocr
# Create environment files
# Copy the sample file for local development
cp .env.sample .env
# Use the same file for Cloud Run and edit PROJECT_ID if deploying
cp .env.sample .env.cloudrun
# Build and run with GPU support
docker-compose up --buildπ That's it! Visit http://localhost:7799 to start processing documents.
Deploy to Google Cloud Run with NVIDIA L4 GPU:
# One-command deployment
./scripts/deploy.ps1 -Target cloudrunSee Deployment Guide for detailed instructions.
graph TD
A[User Upload] --> B[FastAPI Backend]
B --> C[Document Processor]
C --> D[GPU OCR Engine]
D --> E[Text Extraction]
E --> F[GCS Storage]
F --> G[Results API]
G --> H[Web UI]
subgraph "GPU Processing"
D --> I[Nanonets OCR-s Model]
I --> J[CUDA Acceleration]
end
subgraph "Cloud Storage"
F --> K[Model Cache]
F --> L[User Data]
F --> M[Session Files]
end
| Metric | Local (RTX 3090) | Cloud Run (L4) |
|---|---|---|
| Model Startup | ~4 minutes | ~30 seconds |
| Processing Speed | ~20 sec/page | ~5-10 sec/page |
| Live Preview | β Real-time | β Real-time |
| Max File Size | 500MB | 500MB |
| Concurrent Users | 1-2 | 10+ |
| Model Loading | Cached | Persistent Mount |
| Variable | Description | Default |
|---|---|---|
DEVICE |
Processing device | cuda |
MODEL_NAME |
OCR model to use | nanonets/Nanonets-OCR-s |
MAX_FILE_SIZE |
Maximum upload size | 500MB |
SESSION_TIMEOUT |
Session duration | 3600 seconds |
LOG_LEVEL |
Logging verbosity | INFO |
- Minimum: 12GB VRAM (RTX 3060 12GB, RTX 3070)
- Recommended: 12GB+ VRAM (RTX 3080/3090/4070 Ti/4080/4090)
- Cloud: NVIDIA L4 (24GB VRAM)
- π Deployment Guide - Complete deployment instructions
- π§ Setup Guide - Development environment setup
- π API Reference - Interactive OpenAPI documentation
- π³ Docker Guide - Container configuration and troubleshooting
docker-compose up --build./scripts/deploy.ps1 -Target local./scripts/deploy.ps1 -Target cloudrungcloud builds submit --config cloudbuild.yamlcurl -X POST "http://localhost:7799/api/v1/jobs/submit" \
-H "Content-Type: multipart/form-data" \
-F "file=@document.pdf"curl "http://localhost:7799/api/v1/jobs/{job_id}/status"
# Returns incremental results in partial_results arraycurl "http://localhost:7799/api/v1/jobs/{job_id}/result"{
"status": "processing",
"progress": {
"current_page": 3,
"total_pages": 10,
"percent": 30
},
"partial_results": [
{
"page_number": 1,
"text": "# Document Title\nExtracted content...",
"status": "completed",
"confidence": 0.95
},
{
"page_number": 2,
"text": "Page 2 content...",
"status": "completed",
"confidence": 0.93
}
]
}Complete API documentation available at /docs endpoint.
- Live Results Streaming - Real-time text extraction as pages complete
- Incremental Processing - No waiting for entire documents to finish
- Multi-Tenant Architecture - User isolation and data partitioning
- Audit Logging - Complete request/response tracking
- Auto-Scaling - Handles traffic spikes automatically
- Health Monitoring - Built-in health checks and metrics
- Security - CORS, rate limiting, and input validation
- Storage Integration - Google Cloud Storage with lifecycle management
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Nanonets - For the excellent OCR model
- HuggingFace - For the transformers library
- FastAPI - For the robust web framework
- NVIDIA - For CUDA and GPU acceleration
β Star this repository if you find it useful! β
Report Bug β’ Request Feature β’ Join Discord