EasyRag
EasyRag is a modular Retrieval-Augmented Generation (RAG) platform that extracts structured data from PDF documents with high precision. Features a pluggable provider architecture supporting any combination of local and cloud AI services. This project can serve as a proof of concept or maybe more than that. If you are looking to understand how RAG works then this is the project for you.
Web Interface - Query & Highlight
Ask questions about your uploaded documents and get AI-powered answers with precise source highlighting. The system identifies exactly where in the PDF the answer was found, enabling quick verification and traceability.
- Overview
- Quick Start
- Provider Architecture
- Technology Stack
- Vector Search & Embeddings
- Table Detection Pipeline
- API Reference
- Configuration
- GPU Setup (Ollama)
- Testing & Deployment
- Developer Guide
- Contributing
- License
- Modular Architecture: Pluggable provider system supporting Ollama, OpenAI, Anthropic, HuggingFace
- Document Processing: Extracts structured tables from PDFs with precise coordinate mapping
- Semantic Search: Vector embeddings indexed in Qdrant for fast, accurate retrieval
- Runtime Flexibility: Switch between providers via API without restart
- Source Attribution: LLM responses include precise document locations and highlighting
Prerequisites: Python 3.11+, Node.js 18+, Docker Compose, 8GB+ RAM recommended
.\setup.batgit clone https://github.com/malkhabir/EasyRag.git
cd EasyRag
# Start infrastructure
docker compose up -d qdrant ollama
# Backend
cd rag-service
python -m venv venv
# Windows: venv\Scripts\activate | Linux/Mac: source venv/bin/activate
pip install -r requirements.txt
python -m uvicorn main:app --host 0.0.0.0 --port 8080 --reload
# Frontend (new terminal)
cd frontend
npm install
npm run devAccess Points:
- Frontend: http://localhost:5173
- API Documentation: http://localhost:8080/docs
EasyRag features a modular provider system that supports any combination of local and cloud AI services. Switch providers at runtime via API without restarting the application.
| Type | Provider | Models |
|---|---|---|
| LLM | Ollama (local) | phi3, llama2, codellama |
| OpenAI | gpt-3.5-turbo, gpt-4, gpt-4-turbo |
|
| Anthropic | claude-3-sonnet, claude-3-haiku |
|
| Azure OpenAI | Enterprise deployments | |
| Embedding | HuggingFace (local) | BAAI/bge-m3, all-MiniLM-L6-v2 |
| OpenAI | text-embedding-3-small, text-embedding-3-large |
Configure providers in config/providers.yaml:
active_llm_provider: "local"
active_embedding_provider: "huggingface"
llm_providers:
local:
provider: "ollama"
model_name: "phi3"
host: "localhost"
port: 11434
openai:
provider: "openai"
model_name: "gpt-3.5-turbo"
# api_key: ${OPENAI_API_KEY}# List available providers
curl http://localhost:8080/api/v1/providers/llm
# Switch LLM provider
curl -X POST http://localhost:8080/api/v1/providers/llm/switch \
-H "Content-Type: application/json" \
-d '{"provider_name": "openai", "model_name": "gpt-4"}'
# Check system status
curl http://localhost:8080/api/v1/providers/status| Layer | Technology | Version | Purpose |
|---|---|---|---|
| Frontend | React | 19.1.1 | UI component library |
| Vite | 4.5.0 | Build tool and dev server | |
| react-pdf | 10.0.1 | PDF rendering in browser | |
| pdfjs-dist | 5.3.31 | Mozilla's PDF.js for parsing | |
| Backend | Python | 3.11+ | Runtime environment |
| FastAPI | 0.104.0+ | Async REST API framework | |
| Uvicorn | latest | ASGI server | |
| Pydantic | 2.0.0+ | Data validation and settings | |
| AI/ML | LlamaIndex | latest | RAG orchestration framework |
| PyTorch | latest | Deep learning framework | |
| Transformers | latest | HuggingFace model hub | |
| sentence-transformers | latest | Embedding models | |
| Detectron2 | 0.6+ | Object detection (layout) | |
| Document Detection | DIT | latest | Semantic segmentation (page layout) |
| TADetect | latest | Object detection (table regions) | |
| TATR | latest | Table structure recognition | |
| Vector DB | Qdrant | latest | Vector similarity search |
| LLM Providers | Ollama | latest | Local LLM runtime |
| OpenAI API | 1.0.0+ | Cloud LLM (optional) | |
| Anthropic API | 0.3.0+ | Claude models (optional) | |
| PDF Processing | PyMuPDF (fitz) | latest | PDF rasterization |
| pdfplumber | latest | Text extraction | |
| camelot-py | latest | Table extraction | |
| pytesseract | latest | OCR fallback | |
| Infrastructure | Docker Compose | 3.9 | Container orchestration |
EasyRag uses vector indexes to enable semantic search over documents.
Traditional Search: "revenue" → matches documents containing "revenue"
Vector Search: "how much money did we make" → matches documents about revenue, income, earnings, etc.
- Embedding Generation: Text chunks are converted to 1024-dimensional vectors using BGE-M3
- Vector Storage: Vectors are stored in Qdrant with metadata (page, coordinates, source file)
- Similarity Search: Queries are embedded and compared using cosine similarity
- Top-K Retrieval: Most similar chunks are retrieved and passed to the LLM
| Feature | Value |
|---|---|
| Index Type | HNSW (Hierarchical Navigable Small World) |
| Distance Metric | Cosine Similarity |
| Vector Dimensions | 1024 (BGE-M3) |
| Storage | Persistent on disk with in-memory indexing |
from qdrant_client.models import Distance, VectorParams
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(
size=1024, # BGE-M3 embedding dimensions
distance=Distance.COSINE
)
)| Metric | Range | Best For |
|---|---|---|
| Cosine | 0 to 2 | Text embeddings, semantic similarity |
| Dot Product | -∞ to +∞ | Normalized vectors, recommendations |
| Euclidean | 0 to +∞ | Image features, spatial data |
Why Cosine? Measures the angle between vectors, ignoring magnitude. This means "revenue report" and "income statement" will score as highly similar even if one document is longer than the other.
| Property | Value |
|---|---|
| Model | BAAI/bge-m3 |
| Dimensions | 1024 |
| Max Tokens | 8192 |
| Languages | 100+ (multilingual) |
| Features | Dense + Sparse + ColBERT retrieval |
EasyRag uses LlamaIndex as the core RAG framework:
from llama_index.core import Settings, VectorStoreIndex
# Configure embeddings and LLM
Settings.embed_model = embed_model.get() # HuggingFace BGE-M3
Settings.llm = llm_model.get() # Ollama phi3/llama2
# Create vector store and index
vector_store = QdrantVectorStore(client=qdrant_client, collection_name="documents")
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
# Query with similarity search
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query("What is the total revenue?")EasyRag uses a multi-model pipeline for accurate table extraction from PDFs. The system recursively detects nested tables using a tree-based approach, treating each document page as a hierarchy of table containers and atomic tables (leaves).
The pipeline uses three primary detection models from HuggingFace. Each serves a specific purpose in the detection hierarchy.
| Model | Type | How It Works | Best For | HuggingFace Link |
|---|---|---|---|---|
| DIT | Semantic Segmentation | Classifies every pixel into categories (table, text, figure, etc.), then finds contours | Full-page layout, leaf validation | nevernever69/dit-doclaynet-segmentation |
| TADetect | Object Detection | Predicts bounding boxes with confidence scores | Fast table region detection | microsoft/table-transformer-detection |
| TATR | Object Detection | Detects table cells, rows, columns within a table | Table structure recognition | microsoft/table-transformer-structure-recognition |
| Model | Type | Purpose | Source |
|---|---|---|---|
| Detectron2 | Object Detection | Layout detection using Faster R-CNN | LayoutParser Model Zoo |
| LayoutLMv3 | Token Classification | Form understanding with text + layout | nielsr/layoutlmv3-finetuned-funsd |
| Donut | Vision Encoder-Decoder | End-to-end document understanding | naver-clova-ix/donut-base-finetuned-cord-v2 |
Documents with complex layouts often contain nested tables - tables within tables. EasyRag handles this by building a tree structure:
Page (Root)
├── Table A (Container) ──► DIT finds 2 sub-tables
│ ├── Table A.1 (Leaf) ──► DIT finds 0-1 tables, confirmed atomic
│ └── Table A.2 (Leaf) ──► DIT finds 0-1 tables, confirmed atomic
├── Table B (Leaf) ──► DIT finds 0-1 tables, already atomic
└── Text Block (ignored)
Key Concepts:
- Container: A table region that contains other tables inside it
- Leaf: An atomic table with no sub-tables (ready for text extraction)
- Depth: How many levels deep in the tree (0 = page level, 1 = first nesting, etc.)
Algorithm:
- Initial Detection: Run DIT on the full page to find all table regions
- Recursive Descent: For each detected table, run DIT again on the cropped region
- Leaf Validation: If DIT finds 2+ tables inside, it's a container → recurse deeper
- Stopping Criteria: Stop recursing when sub-tables are too small or too many are detected
- Output: Only leaf tables (atomic units) are sent for text extraction
Without stopping criteria, DIT would recurse forever, eventually detecting individual cells as "tables". These constants control when to stop:
| Constant | Default | Description |
|---|---|---|
SELF_DETECTION_RATIO |
0.95 | Skip if sub-table is ≥95% of parent (self-detection) |
MIN_TABLE_WIDTH_PX |
80px | Skip sub-tables narrower than this |
MIN_TABLE_HEIGHT_PX |
50px | Skip sub-tables shorter than this |
MIN_TABLE_AREA_PX |
4000px² | Skip tiny fragments |
MIN_SUBTABLE_AREA_RATIO |
5% | Skip if sub-table is <5% of parent area |
MAX_ASPECT_RATIO |
8.0 | Skip extreme strips (likely rows/columns, not tables) |
MAX_SUBTABLES_PER_REGION |
10 | If >10 detected, it's cells not tables → stop |
MAX_RECURSIVE_DEPTH |
5 | Maximum tree depth to prevent infinite recursion |
Tuning Guide:
- DIT finds cells instead of tables? → Increase
MIN_TABLE_WIDTH_PX,MIN_TABLE_AREA_PX - DIT misses valid sub-tables? → Decrease those values
- Rows/columns detected as tables? → Decrease
MAX_ASPECT_RATIO - Too many fragments? → Decrease
MAX_SUBTABLES_PER_REGION
| Aspect | Segmentation (DIT) | Object Detection (TADetect) |
|---|---|---|
| Output | Pixel mask (each pixel gets a class label) | Bounding boxes with confidence scores |
| Precision | Exact boundaries at pixel level | Rectangular boxes only |
| Context Needed | Requires full document layout context | Works on isolated crops |
| Speed | Slower (processes all pixels) | Faster (sparse predictions) |
| Use Case | Page-level detection, leaf validation | Fast initial scanning |
Why DIT for leaf validation? DIT consistently finds 2 tables when a region contains nested tables, and 0-1 when it's truly atomic. This binary signal is reliable for determining container vs leaf status.
PDF Page
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Stage 1: Initial Detection (DIT on full page) │
│ - Finds all table regions with full document context │
│ - Output: List of candidate table boxes │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Stage 2: Recursive Nesting Detection │
│ - For each table, crop region and run DIT again │
│ - If 2+ sub-tables found → mark as container, recurse │
│ - If 0-1 sub-tables → mark as leaf, stop │
│ - Apply stopping criteria to prevent over-segmentation │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Stage 3: Leaf Validation │
│ - Final DIT pass on each "leaf" to confirm it's atomic │
│ - If DIT finds 2+ tables → split and re-validate │
│ - Ensures no nested tables are missed │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Stage 4: Text Extraction & Embedding │
│ - Only leaf tables are processed │
│ - Extract text via Camelot/pdfplumber │
│ - Generate embeddings, store in Qdrant │
└─────────────────────────────────────────────────────────────┘
All detection parameters are in rag-service/app/document_processing/constants.py:
# Detection model selection
NESTED_DETECTION_MODEL = "dit_only" # "dit_only", "tadetect_only", or "both"
VALIDATE_LEAVES_WITH_DIT = True # Final validation pass
# Confidence thresholds
TADETECT_TABLE_CONF_THRESHOLD = 0.15 # Lower = more sensitive
TATR_TABLE_CONF_THRESHOLD = 0.3
# Stopping criteria
MIN_TABLE_WIDTH_PX = 80
MIN_TABLE_HEIGHT_PX = 50
MIN_TABLE_AREA_PX = 4000
MAX_ASPECT_RATIO = 8.0
MAX_SUBTABLES_PER_REGION = 10
MAX_RECURSIVE_DEPTH = 5| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/upload |
Upload a PDF document |
GET |
/api/v1/files |
List all uploaded files |
GET |
/api/v1/files/{filename} |
Download/view a specific file |
DELETE |
/api/v1/files/{filename} |
Delete a file |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/query?q={query} |
Query documents with natural language |
GET |
/api/v1/query?q={query}&files={file1} |
Query specific files |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/providers/llm |
List available LLM providers |
GET |
/api/v1/providers/embedding |
List embedding providers |
POST |
/api/v1/providers/llm/switch |
Switch LLM provider |
POST |
/api/v1/providers/embedding/switch |
Switch embedding provider |
GET |
/api/v1/providers/status |
System health status |
Full API documentation: http://localhost:8080/docs
Copy .env.example to .env:
# Active Providers
EASYRAG_ACTIVE_LLM_PROVIDER=local
EASYRAG_ACTIVE_EMBEDDING_PROVIDER=huggingface
# API Keys (for cloud providers)
OPENAI_API_KEY=your_openai_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here
# Local Ollama Configuration
EASYRAG_LLM_PROVIDERS__LOCAL__HOST=localhost
EASYRAG_LLM_PROVIDERS__LOCAL__PORT=11434For GPU-accelerated local LLMs, ensure proper NVIDIA configuration.
# Verify GPU on host
nvidia-smi
# Verify Docker GPU access
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi# Install NVIDIA driver
sudo apt install nvidia-driver-535
# Install Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update && sudo apt install -y nvidia-docker2
sudo systemctl restart dockerInstall the NVIDIA WSL driver, enable WSL2 backend in Docker Desktop, and test nvidia-smi inside WSL.
services:
ollama:
image: ollama/ollama:latest-gpu
device_requests:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- NVIDIA_VISIBLE_DEVICES=all| Error | Solution |
|---|---|
CUDA driver version is insufficient |
Update NVIDIA driver on host |
could not select device driver |
Install nvidia-docker2 or enable WSL2 GPU |
| Docker test fails but host works | Restart Docker after installing toolkit |
cd rag-service
pytest tests/ -v
# With coverage
pytest tests/ --cov=app --cov-report=html# Start all services
docker-compose up -d
# Check status
docker-compose ps
# View logs
docker-compose logs -fUse built-in tasks via Ctrl+Shift+P → "Tasks: Run Task":
- Docker: Start Services
- RAG Service: Run Server
- Frontend: Run Dev Server
See LAUNCH_INSTRUCTIONS.md for detailed setup.
rag-service/
├── app/
│ ├── api/v1/ # REST endpoints
│ ├── core/ # Config, logging
│ ├── db/ # Qdrant client
│ ├── document_processing/ # PDF extraction
│ ├── models/ # LLM & embedding wrappers
│ └── services/ # Business logic
└── tests/
# rag-service/app/models/providers/my_provider.py
from .base import LLMProvider
class MyProvider(LLMProvider):
def generate(self, prompt: str, context: str) -> str:
# Implement API call
pass# rag-service/app/models/embedding.py
class MyEmbeddings(EmbeddingProvider):
def embed(self, texts: List[str]) -> List[List[float]]:
# Return embedding vectors
pass- Maintain consistent embedding dimensions (1024 for BGE-M3) across pipelines
- Use row-level granularity for table embeddings for optimal retrieval
- Store both pixel and PDF coordinates for flexible highlighting
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m "Add amazing feature" - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
MIT License - see LICENSE for details.
- Ollama - Local LLM runtime
- Qdrant - Vector database
- LlamaIndex - RAG framework
- BGE-M3 - Embedding model
- Camelot - PDF table extraction
- DIT - Document layout detection
- PyMuPDF - PDF rendering
