🔍 Gnosis OCR

Enterprise-Grade GPU-Accelerated OCR Service

Extract text from documents with state-of-the-art AI models

Quick Start • Features • Documentation • Deployment • API

✨ Features

🚀 GPU-Accelerated Processing - Powered by NVIDIA CUDA for blazing-fast OCR
🎯 State-of-the-Art Models - Uses Nanonets OCR-s for superior accuracy
📄 Large File Support - Process up to 500MB PDFs with HTTP/2 streaming (uploads work during model loading)
⚡ Live Results Preview - See extracted text in real-time as pages complete
🔄 Incremental Processing - No waiting - results appear as they're ready
☁️ Cloud Native - Deploy to Google Cloud Run with auto-scaling
🔒 Enterprise Security - User isolation, session management, and audit trails
📊 Real-Time Progress - Live processing updates and detailed analytics
🌐 Modern Web UI - Beautiful, responsive interface with dark mode
🔌 REST API - Full programmatic access with OpenAPI docs

🚀 Quick Start

Prerequisites

NVIDIA GPU with 12GB+ VRAM (RTX 3060 12GB/3070/3080/3090 or better)
Docker Desktop with GPU support
16GB+ RAM (32GB recommended for large documents)

Local Development

# Clone the repository
git clone https://github.com/kordless/gnosis-ocr.git
cd gnosis-ocr

# Create environment files
# Copy the sample file for local development
cp .env.sample .env
# Use the same file for Cloud Run and edit PROJECT_ID if deploying
cp .env.sample .env.cloudrun

# Build and run with GPU support
docker-compose up --build

🎉 That's it! Visit http://localhost:7799 to start processing documents.

Cloud Deployment

Deploy to Google Cloud Run with NVIDIA L4 GPU:

# One-command deployment
./scripts/deploy.ps1 -Target cloudrun

See Deployment Guide for detailed instructions.

📸 Screenshots

Document Upload

Live Results Preview

OCR Results

🏗️ Architecture

graph TD
    A[User Upload] --> B[FastAPI Backend]
    B --> C[Document Processor]
    C --> D[GPU OCR Engine]
    D --> E[Text Extraction]
    E --> F[GCS Storage]
    F --> G[Results API]
    G --> H[Web UI]
    
    subgraph "GPU Processing"
        D --> I[Nanonets OCR-s Model]
        I --> J[CUDA Acceleration]
    end
    
    subgraph "Cloud Storage"
        F --> K[Model Cache]
        F --> L[User Data]
        F --> M[Session Files]
    end

🎯 Performance

Metric	Local (RTX 3090)	Cloud Run (L4)
Model Startup	~4 minutes	~30 seconds
Processing Speed	~20 sec/page	~5-10 sec/page
Live Preview	✅ Real-time	✅ Real-time
Max File Size	500MB	500MB
Concurrent Users	1-2	10+
Model Loading	Cached	Persistent Mount

🔧 Configuration

Environment Variables

Variable	Description	Default
`DEVICE`	Processing device	`cuda`
`MODEL_NAME`	OCR model to use	`nanonets/Nanonets-OCR-s`
`MAX_FILE_SIZE`	Maximum upload size	`500MB`
`SESSION_TIMEOUT`	Session duration	`3600` seconds
`LOG_LEVEL`	Logging verbosity	`INFO`

GPU Requirements

Minimum: 12GB VRAM (RTX 3060 12GB, RTX 3070)
Recommended: 12GB+ VRAM (RTX 3080/3090/4070 Ti/4080/4090)
Cloud: NVIDIA L4 (24GB VRAM)

📖 Documentation

📚 Deployment Guide - Complete deployment instructions
🔧 Setup Guide - Development environment setup
🌐 API Reference - Interactive OpenAPI documentation
🐳 Docker Guide - Container configuration and troubleshooting

🚀 Deployment

Local Development

docker-compose up --build

Local Dev Environment

./scripts/deploy.ps1 -Target local

Production (Cloud Run)

./scripts/deploy.ps1 -Target cloudrun

Cloud Build (CI/CD)

gcloud builds submit --config cloudbuild.yaml

📡 API Reference

Upload Document

curl -X POST "http://localhost:7799/api/v1/jobs/submit" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@document.pdf"

Check Status (with Live Results)

curl "http://localhost:7799/api/v1/jobs/{job_id}/status"
# Returns incremental results in partial_results array

Get Final Results

curl "http://localhost:7799/api/v1/jobs/{job_id}/result"

Example Status Response

{
  "status": "processing",
  "progress": {
    "current_page": 3,
    "total_pages": 10,
    "percent": 30
  },
  "partial_results": [
    {
      "page_number": 1,
      "text": "# Document Title\nExtracted content...",
      "status": "completed",
      "confidence": 0.95
    },
    {
      "page_number": 2,
      "text": "Page 2 content...",
      "status": "completed",
      "confidence": 0.93
    }
  ]
}

Complete API documentation available at /docs endpoint.

🏢 Enterprise Features

Live Results Streaming - Real-time text extraction as pages complete
Incremental Processing - No waiting for entire documents to finish
Multi-Tenant Architecture - User isolation and data partitioning
Audit Logging - Complete request/response tracking
Auto-Scaling - Handles traffic spikes automatically
Health Monitoring - Built-in health checks and metrics
Security - CORS, rate limiting, and input validation
Storage Integration - Google Cloud Storage with lifecycle management

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Nanonets - For the excellent OCR model
HuggingFace - For the transformers library
FastAPI - For the robust web framework
NVIDIA - For CUDA and GPU acceleration

📊 Analytics

⭐ Star this repository if you find it useful! ⭐

Report Bug • Request Feature • Join Discord

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
app		app
archive		archive
.env.sample		.env.sample
.gitignore		.gitignore
Dockerfile.unified		Dockerfile.unified
GEMINI.md		GEMINI.md
LICENSE		LICENSE
README.md		README.md
SESSION.md		SESSION.md
deploy.ps1		deploy.ps1
deploy.sh		deploy.sh
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

License

kordless/gnosis-ocr

Folders and files

Latest commit

History

Repository files navigation

🔍 Gnosis OCR

✨ Features

🚀 Quick Start

Prerequisites

Local Development

Cloud Deployment

📸 Screenshots

Document Upload

Live Results Preview

OCR Results

🏗️ Architecture

🎯 Performance

🔧 Configuration

Environment Variables

GPU Requirements

📖 Documentation

🚀 Deployment

Local Development

Local Dev Environment

Production (Cloud Run)

Cloud Build (CI/CD)

📡 API Reference

Upload Document

Check Status (with Live Results)

Get Final Results

Example Status Response

🏢 Enterprise Features

🤝 Contributing

📄 License

🙏 Acknowledgments

📊 Analytics

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages