Skip to content

kordless/gnosis-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” Gnosis OCR

Enterprise-Grade GPU-Accelerated OCR Service

Extract text from documents with state-of-the-art AI models

License: MIT Docker GPU Accelerated Cloud Run

Quick Start β€’ Features β€’ Documentation β€’ Deployment β€’ API

✨ Features

  • πŸš€ GPU-Accelerated Processing - Powered by NVIDIA CUDA for blazing-fast OCR
  • 🎯 State-of-the-Art Models - Uses Nanonets OCR-s for superior accuracy
  • πŸ“„ Large File Support - Process up to 500MB PDFs with HTTP/2 streaming (uploads work during model loading)
  • ⚑ Live Results Preview - See extracted text in real-time as pages complete
  • πŸ”„ Incremental Processing - No waiting - results appear as they're ready
  • ☁️ Cloud Native - Deploy to Google Cloud Run with auto-scaling
  • πŸ”’ Enterprise Security - User isolation, session management, and audit trails
  • πŸ“Š Real-Time Progress - Live processing updates and detailed analytics
  • 🌐 Modern Web UI - Beautiful, responsive interface with dark mode
  • πŸ”Œ REST API - Full programmatic access with OpenAPI docs

πŸš€ Quick Start

Prerequisites

  • NVIDIA GPU with 12GB+ VRAM (RTX 3060 12GB/3070/3080/3090 or better)

  • Docker Desktop with GPU support

  • 16GB+ RAM (32GB recommended for large documents)

Local Development

# Clone the repository
git clone https://github.com/kordless/gnosis-ocr.git
cd gnosis-ocr

# Create environment files
# Copy the sample file for local development
cp .env.sample .env
# Use the same file for Cloud Run and edit PROJECT_ID if deploying
cp .env.sample .env.cloudrun

# Build and run with GPU support
docker-compose up --build

πŸŽ‰ That's it! Visit http://localhost:7799 to start processing documents.

Cloud Deployment

Deploy to Google Cloud Run with NVIDIA L4 GPU:

# One-command deployment
./scripts/deploy.ps1 -Target cloudrun

See Deployment Guide for detailed instructions.

πŸ“Έ Screenshots

Document Upload

Upload Interface

Live Results Preview

Live Preview

OCR Results

Results Display

πŸ—οΈ Architecture

graph TD
    A[User Upload] --> B[FastAPI Backend]
    B --> C[Document Processor]
    C --> D[GPU OCR Engine]
    D --> E[Text Extraction]
    E --> F[GCS Storage]
    F --> G[Results API]
    G --> H[Web UI]
    
    subgraph "GPU Processing"
        D --> I[Nanonets OCR-s Model]
        I --> J[CUDA Acceleration]
    end
    
    subgraph "Cloud Storage"
        F --> K[Model Cache]
        F --> L[User Data]
        F --> M[Session Files]
    end
Loading

🎯 Performance

Metric Local (RTX 3090) Cloud Run (L4)
Model Startup ~4 minutes ~30 seconds
Processing Speed ~20 sec/page ~5-10 sec/page
Live Preview βœ… Real-time βœ… Real-time
Max File Size 500MB 500MB
Concurrent Users 1-2 10+
Model Loading Cached Persistent Mount

πŸ”§ Configuration

Environment Variables

Variable Description Default
DEVICE Processing device cuda
MODEL_NAME OCR model to use nanonets/Nanonets-OCR-s
MAX_FILE_SIZE Maximum upload size 500MB
SESSION_TIMEOUT Session duration 3600 seconds
LOG_LEVEL Logging verbosity INFO

GPU Requirements

  • Minimum: 12GB VRAM (RTX 3060 12GB, RTX 3070)
  • Recommended: 12GB+ VRAM (RTX 3080/3090/4070 Ti/4080/4090)
  • Cloud: NVIDIA L4 (24GB VRAM)

πŸ“– Documentation

πŸš€ Deployment

Local Development

docker-compose up --build

Local Dev Environment

./scripts/deploy.ps1 -Target local

Production (Cloud Run)

./scripts/deploy.ps1 -Target cloudrun

Cloud Build (CI/CD)

gcloud builds submit --config cloudbuild.yaml

πŸ“‘ API Reference

Upload Document

curl -X POST "http://localhost:7799/api/v1/jobs/submit" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@document.pdf"

Check Status (with Live Results)

curl "http://localhost:7799/api/v1/jobs/{job_id}/status"
# Returns incremental results in partial_results array

Get Final Results

curl "http://localhost:7799/api/v1/jobs/{job_id}/result"

Example Status Response

{
  "status": "processing",
  "progress": {
    "current_page": 3,
    "total_pages": 10,
    "percent": 30
  },
  "partial_results": [
    {
      "page_number": 1,
      "text": "# Document Title\nExtracted content...",
      "status": "completed",
      "confidence": 0.95
    },
    {
      "page_number": 2,
      "text": "Page 2 content...",
      "status": "completed",
      "confidence": 0.93
    }
  ]
}

Complete API documentation available at /docs endpoint.

🏒 Enterprise Features

  • Live Results Streaming - Real-time text extraction as pages complete
  • Incremental Processing - No waiting for entire documents to finish
  • Multi-Tenant Architecture - User isolation and data partitioning
  • Audit Logging - Complete request/response tracking
  • Auto-Scaling - Handles traffic spikes automatically
  • Health Monitoring - Built-in health checks and metrics
  • Security - CORS, rate limiting, and input validation
  • Storage Integration - Google Cloud Storage with lifecycle management

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Nanonets - For the excellent OCR model
  • HuggingFace - For the transformers library
  • FastAPI - For the robust web framework
  • NVIDIA - For CUDA and GPU acceleration

πŸ“Š Analytics

GitHub stars GitHub forks GitHub issues GitHub pull requests


⭐ Star this repository if you find it useful! ⭐

Report Bug β€’ Request Feature β€’ Join Discord

About

A Gnosis PDF OCR service built using Nanonets-OCR-s model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •