🔍 InterPARES-Vision

Advanced Document Layout Analysis & Structured Output Generation

A powerful AI-driven tool for archivists to digitize, structure, and interact with documents using advanced OCR and vision models.

🌐 Live Demo

Try InterPARES-Vision online: demos.dlnlp.ai/InterPARES/

No installation required - access the full functionality through your web browser!

📖 Overview

InterPARES-Vision is an advanced OCR (Optical Character Recognition) and layout analysis tool designed specifically for archival documents. It combines state-of-the-art AI vision models to extract text, preserve document structure, and generate machine-readable outputs from scanned documents and images.

💡 Key Capabilities

Document Structure Understanding: Identifies headings, paragraphs, tables, lists, and maintains proper reading order
Interactive AI Chat: Ask questions about parsed documents using natural language
Metadata Extraction: Request translations, summaries, and structured metadata through conversational queries
Multi-format Output: Generate Markdown, JSON, and annotated visualizations
Batch Processing: Handle multi-page PDFs with consistent quality

✨ Features

🎯 Core Features

Layout Detection: Identifies document regions including text blocks, tables, images, headings, and captions
OCR Text Extraction: Extracts text with high accuracy, even from degraded or complex documents
Structure Preservation: Maintains document hierarchy and reading order
Multi-page PDF Support: Process entire PDF documents with page-by-page analysis
Interactive Visualization: View detected layout regions overlaid on original documents

💬 AI-Powered Chat

Natural Language Queries: Ask questions about document content in plain language
Metadata Generation: Extract structured metadata in JSON format for archival systems
Translation Support: Request translations of document sections or entire documents
Summarization: Get concise summaries and key information extraction
Classification Assistance: Identify document types and suggest archival categories

📊 Output Formats

Markdown: Formatted text with preserved structure and hierarchy
JSON: Structured data with bounding boxes, element types, and coordinates
Annotated Images: Visual overlay showing detected layout regions with color-coded boxes
Downloadable Results: ZIP package with all output formats and original files

📁 Supported File Types

Format	Extensions	Description
PDF Documents	`.pdf`	Multi-page or single-page PDF files (processed page-by-page)
Images	`.jpg`, `.jpeg`, `.png`	Scanned images or photographs of documents

💡 Best Results: Use high-resolution scans (200+ DPI) with good contrast and minimal skew.

🚀 Quick Start

Prerequisites

Python 3.8 or higher
CUDA-compatible GPU (recommended for optimal performance)
8GB+ RAM (16GB+ recommended for large documents)

Installation

# Clone the repository
git clone https://github.com/your-org/InterPARES-vision.git
cd InterPARES-vision

# Install dependencies
pip install -r requirements.txt

# Install DotsOCR parser
pip install dots-ocr

Running the Application

# Start the application (default port: 7860)
python app.py 7860

# Or specify a custom port
python app.py 8080

The application will be available at http://localhost:7860 (or your specified port).

📘 Usage Guide

1️⃣ Select or Upload a Document

Option A: Use Example Documents

Click on any thumbnail in the "📥 Select Example Document" gallery
Browse through available examples using Previous/Next buttons

Option B: Upload Your Own

Click "📁 Upload PDF or Image" button
Select a file from your computer (PDF, JPG, PNG)

2️⃣ Navigate Multi-Page Documents

For PDF files:

Use ⬅ Previous and Next ➡ buttons to browse pages
View current position with page counter (e.g., "2 / 10")

3️⃣ Choose a Prompt Mode

Mode	Description	Best For
prompt_layout_all_en	Full analysis: layout + OCR + reading order	Complex documents with mixed content
prompt_layout_only_en	Layout detection without text extraction	Understanding document organization
prompt_ocr	OCR-focused with minimal layout	Simple text documents

💡 Recommendation: Start with prompt_layout_all_en for comprehensive analysis.

4️⃣ Parse the Document

Click 🔍 Parse to begin processing. The system will:

Analyze document layout
Extract text from detected regions
Generate structured output in multiple formats

5️⃣ View Results

Results appear in three tabs:

Markdown Render Preview: Human-readable formatted view
Markdown Raw Text: Plain Markdown with formatting codes
Current Page JSON: Structured data with coordinates and element types

6️⃣ Ask Questions (💬 Interactive Chat)

After parsing, use the AI chat feature:

Example Questions:
- "Extract the main keywords for archival indexing"
- "What is the document type and subject matter?"
- "Extract metadata in JSON format"
- "Translate the summary section into French"
- "List all dates, names, and locations mentioned"

7️⃣ Download Results

Click ⬇️ Download Results to get a ZIP file containing:

Layout images with annotations
JSON files with structured data
Markdown files with formatted text
Original input file

🏛️ Archival Applications

Use Cases for Archivists

Digitization Projects: Convert scanned documents to searchable, structured text
Metadata Extraction: Automatically generate catalog records and finding aids
Collection Assessment: Rapidly evaluate document content and significance
Multilingual Access: Translate documents for broader accessibility
Data Extraction: Pull structured information from historical records
Classification Support: AI-assisted document type and subject identification

Best Practices

✅ Use consistent scan settings (200+ DPI) for optimal results
✅ Process similar document types together with the same prompt mode
✅ Review sample outputs (5-10%) from each batch for quality assurance
✅ Keep original scans alongside OCR outputs in your digital repository
✅ Document processing settings (tool version, prompt mode, date) in metadata
✅ Verify AI-generated metadata against professional archival standards

🔧 Configuration

Server Configuration

Default settings in app.py:

DEFAULT_CONFIG = {
    'ip': "127.0.0.1",
    'port_vllm': 8001,
    'min_pixels': MIN_PIXELS,
    'max_pixels': MAX_PIXELS,
    'test_images_dir': "./assets/showcase_origin",
}

Chat Model Configuration

The chat feature uses vLLM with OpenAI-compatible API:

chat_client = ChatOpenAI(
    base_url="http://localhost:8000/v1",
    api_key="EMPTY",
    model="Qwen3-4B-Instruct-2507-FP8",
    temperature=0.1,
    max_tokens=16000,
    streaming=True
)

📚 Documentation

For detailed documentation, see:

User Guide - Comprehensive feature walkthrough
API Documentation - Developer reference (coming soon)
Archival Workflows - Best practices guide (coming soon)

Development Setup

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
pip install -r requirements-dev.txt

📞 Support

Live Demo: demos.dlnlp.ai/InterPARES/
Issues: GitHub Issues

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets/showcase_origin		assets/showcase_origin
dots_ocr		dots_ocr
README.md		README.md
app.py		app.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔍 InterPARES-Vision

🌐 Live Demo

📖 Overview

💡 Key Capabilities

✨ Features

🎯 Core Features

💬 AI-Powered Chat

📊 Output Formats

📁 Supported File Types

🚀 Quick Start

Prerequisites

Installation

Running the Application

📘 Usage Guide

1️⃣ Select or Upload a Document

2️⃣ Navigate Multi-Page Documents

3️⃣ Choose a Prompt Mode

4️⃣ Parse the Document

5️⃣ View Results

6️⃣ Ask Questions (💬 Interactive Chat)

7️⃣ Download Results

🏛️ Archival Applications

Use Cases for Archivists

Best Practices

🔧 Configuration

Server Configuration

Chat Model Configuration

📚 Documentation

Development Setup

📞 Support

About

Uh oh!

Releases

Packages

Languages

UBC-NLP/InterPARES_vision

Folders and files

Latest commit

History

Repository files navigation

🔍 InterPARES-Vision

🌐 Live Demo

📖 Overview

💡 Key Capabilities

✨ Features

🎯 Core Features

💬 AI-Powered Chat

📊 Output Formats

📁 Supported File Types

🚀 Quick Start

Prerequisites

Installation

Running the Application

📘 Usage Guide

1️⃣ Select or Upload a Document

2️⃣ Navigate Multi-Page Documents

3️⃣ Choose a Prompt Mode

4️⃣ Parse the Document

5️⃣ View Results

6️⃣ Ask Questions (💬 Interactive Chat)

7️⃣ Download Results

🏛️ Archival Applications

Use Cases for Archivists

Best Practices

🔧 Configuration

Server Configuration

Chat Model Configuration

📚 Documentation

Development Setup

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages