Skip to content
/ ReRag Public

Production-ready Retrieval-Augmented Generation (RAG) system with hybrid retrieval, Self-RAG agent workflows, cross-encoder reranking, and comprehensive benchmarking.

Notifications You must be signed in to change notification settings

spyrchat/ReRag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ReRag: a Reconfigurable Retrieval-Augmented-Generation Experimentation and Validation framework

Version: 2.0.0
Author: Spiros Chatzigeorgiou

Production-ready Retrieval-Augmented Generation (RAG) system with hybrid retrieval, Self-RAG agent workflows, cross-encoder reranking, and comprehensive benchmarking.


πŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • Docker & Docker Compose
  • 16GB+ RAM recommended
  • API keys: Google AI, OpenAI (optional: Voyage AI)

1. Setup Environment

# Clone repository
git clone <repository-url>
cd ReRag

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

# Configure API keys
cp .env_example .env
# Edit .env and add your API keys:
#   GOOGLE_API_KEY=your_key_here
#   OPENAI_API_KEY=your_key_here

2. Start Vector Database

# Start Qdrant
docker-compose up -d

# Verify it's running
curl http://localhost:6333/healthz

#You can see the ingestion results in Qdrant's Web UI visiting the link below:
http://localhost:6333/dashboard#/collections

3. Run Your First Pipeline

#First download the dataset from the scripts folder
# Ingest documents (requires dataset - see Data Ingestion section)
python bin/ingest.py ingest --config pipelines/configs/datasets/stackoverflow_hybrid.yml

# Run agent in interactive mode
python main.py

# Run agent with single query
python main.py --query "What are Python best practices?"

# Run Self-RAG mode (with iterative refinement)
python main.py --mode self-rag --query "Explain how asyncio works"

πŸ“š User Guide

Data Ingestion

Ingest documents into the vector database:

# Basic ingestion from config
python bin/ingest.py ingest --config pipelines/configs/datasets/stackoverflow_hybrid.yml

# Test with dry run (no upload)
python bin/ingest.py ingest --config my_config.yml --dry-run --max-docs 100

# Check ingestion status
python bin/ingest.py status

# Cleanup canary collections
python bin/ingest.py cleanup

Configuration File Format (pipelines/configs/datasets/*.yml):

dataset:
  name: "my_dataset"
  adapter: "stackoverflow"  # or full path: "pipelines.adapters.custom.MyAdapter"
  path: "datasets/sosum/data"

embedding:
  strategy: "hybrid"  # or "dense" or "sparse"
  dense:
    provider: "google"
    model: "text-embedding-004"
  sparse:
    provider: "sparse"
    model: "Qdrant/bm25"

qdrant:
  collection: "my_collection"
  host: "localhost"
  port: 6333

Retrieval Testing

Test retrieval pipelines before using in agents:

# Use any retrieval configuration
python bin/retrieval_pipeline.py \
  --config pipelines/configs/retrieval/basic_dense.yml \
  --query "How to handle Python exceptions?" \
  --top-k 5

Agent Workflows

Run the RAG agent with two available modes:

# Standard RAG mode (single-pass)
python main.py --query "Explain Python decorators"

# Self-RAG mode (iterative refinement with verification)
python main.py --mode self-rag --query "How does asyncio work?"

# Interactive chat
python main.py
# or
python main.py --mode self-rag

Benchmarking

Run evaluation experiments:

# Run experiment with output directory
python -m benchmarks.experiment1 --output-dir results/exp1

# Run 2D grid optimization for hybrid search parameters
python -m benchmarks.optimize_2d_grid_alpha_rrfk \
  --scenario-yaml benchmark_scenarios/your_scenario.yml \
  --dataset-path datasets/sosum/data \
  --n-folds 5 \
  --output-dir results/optimization

# Generate ground truth for evaluation
python -m benchmarks.generate_ground_truth \
  --queries-file queries.json \
  --output-file ground_truth.json

See benchmarks/README.md for detailed documentation.


πŸ“– System Architecture

Overview

Modular RAG system with three main subsystems:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     RAG System                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                            β”‚
β”‚  πŸ“Š INGESTION β†’ πŸ” RETRIEVAL β†’ πŸ€– AGENT                    β”‚
β”‚                                                            β”‚
β”‚  Documents      Vector Search    LangGraph                 β”‚
β”‚  Chunking       Reranking        Response Gen              β”‚
β”‚  Embedding      Filtering        Verification              β”‚
β”‚  ↓                ↓                 ↓                      β”‚
β”‚  └───────────→ Qdrant β†β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β”‚
β”‚                                                            β”‚
β”‚  πŸ“ˆ BENCHMARKS: Evaluation & Optimization                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

Component Purpose Documentation
pipelines/ Data ingestion & processing README
components/ Retrieval pipeline (filters, rerankers) README
embedding/ Multi-provider embeddings README
retrievers/ Dense/sparse/hybrid search README
agent/ LangGraph workflows (Standard + Self-RAG) README
database/ Qdrant vector database interface README
benchmarks/ Evaluation framework README
config/ Configuration system -

πŸ”§ Installation

1. Python Environment

# Clone repository
git clone <repository-url>
cd Thesis

# Create virtual environment (Python 3.11+ required)
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

2. API Keys

# Create environment file
cp .env_example .env

Edit .env and add your API keys:

# Required
GOOGLE_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

# Optional
VOYAGE_API_KEY=your_key_here

3. Start Vector Database

# Start Qdrant using Docker
docker-compose up -d

# Verify it's running
curl http://localhost:6333/health

πŸ“ Project Structure

Thesis/
β”œβ”€β”€ readme.md                      # This file
β”œβ”€β”€ main.py                        # Agent entry point (Standard & Self-RAG modes)
β”œβ”€β”€ config.yml                     # Main configuration file
β”œβ”€β”€ docker-compose.yml             # Qdrant database setup
β”œβ”€β”€ requirements.txt               # Python dependencies
β”‚
β”œβ”€β”€ agent/                         # LangGraph agent workflows
β”‚   β”œβ”€β”€ graph_refined.py          # Standard RAG workflow
β”‚   β”œβ”€β”€ graph_self_rag.py         # Self-RAG workflow (iterative refinement)
β”‚   β”œβ”€β”€ schema.py                 # State definitions
β”‚   └── nodes/                    # Agent nodes (retriever, generator, grader)
β”‚
β”œβ”€β”€ pipelines/                     # Data ingestion
β”‚   β”œβ”€β”€ adapters/                 # Dataset adapters (StackOverflow, custom)
β”‚   β”œβ”€β”€ ingest/                   # Ingestion pipeline core
β”‚   β”œβ”€β”€ eval/                     # Retrieval evaluator
β”‚   └── configs/                  # Dataset configurations
β”‚       └── datasets/             # Per-dataset configs
β”‚
β”œβ”€β”€ components/                    # Retrieval pipeline components
β”‚   β”œβ”€β”€ retrieval_pipeline.py    # Pipeline orchestration
β”‚   β”œβ”€β”€ rerankers.py             # CrossEncoder, Semantic, ColBERT, MultiStage
β”‚   β”œβ”€β”€ filters.py               # Tag, duplicate, relevance filters
β”‚   └── post_processors.py       # Result enhancement & limiting
β”‚
β”œβ”€β”€ retrievers/                    # Core retrieval implementations
β”‚   β”œβ”€β”€ dense_retriever.py       # Dense/sparse/hybrid retrieval
β”‚   └── base.py                  # Abstract interfaces
β”‚
β”œβ”€β”€ embedding/                     # Embedding providers
β”‚   β”œβ”€β”€ factory.py               # Provider factory
β”‚   β”œβ”€β”€ providers/               # Google, OpenAI, Voyage, HuggingFace
β”‚   └── base_embedder.py        # Abstract interfaces
β”‚
β”œβ”€β”€ database/                      # Vector database
β”‚   β”œβ”€β”€ qdrant_controller.py    # Qdrant integration
β”‚   └── base.py                  # Abstract interfaces
β”‚
β”œβ”€β”€ config/                        # Configuration system
β”‚   β”œβ”€β”€ config_loader.py         # YAML config loader
β”‚   └── llm_factory.py           # LLM provider factory
β”‚
β”œβ”€β”€ benchmarks/                    # Evaluation framework
β”‚   β”œβ”€β”€ experiment1.py           # Main experiment runner
β”‚   β”œβ”€β”€ optimize_2d_grid_alpha_rrfk.py  # Grid search optimization
β”‚   β”œβ”€β”€ llm_as_judge_eval.py     # LLM-based evaluation
β”‚   β”œβ”€β”€ generate_ground_truth.py # Ground truth generation
β”‚   β”œβ”€β”€ benchmarks_runner.py     # Core benchmark runner
β”‚   β”œβ”€β”€ benchmarks_metrics.py    # Metrics (Recall, Precision, MRR, NDCG)
β”‚   β”œβ”€β”€ report_generator.py      # Report generation (used by experiments)
β”‚   └── statistical_analyzer.py  # Statistical analysis
β”‚
β”œβ”€β”€ bin/                          # CLI tools
β”‚   β”œβ”€β”€ ingest.py                # Ingestion CLI
β”‚   β”œβ”€β”€ retrieval_pipeline.py   # Retrieval testing CLI
β”‚   β”œβ”€β”€ qdrant_inspector.py     # Database inspection
β”‚   └── switch_agent_config.py  # Config switcher
β”‚
β”œβ”€β”€ logs/                         # Application logs
β”‚   β”œβ”€β”€ agent.log                # Main agent log
β”‚   β”œβ”€β”€ ingestion.log            # Ingestion log
β”‚   └── utils/logger.py          # Custom logger
β”‚
└── tests/                        # Test suite
    β”œβ”€β”€ test_self_rag_integration.py  # Self-RAG integration tests
    └── [other test files]

βš™οΈ Configuration

Configuration Files

Main Config (config.yml):

  • System-wide settings
  • Loaded by config/config_loader.py

Pipeline Configs (pipelines/configs/):

  • datasets/ - Dataset-specific configs (ingestion)
  • retrieval/ - Retrieval pipeline configs

Example: Ingestion Config

dataset:
  name: "stackoverflow"
  adapter: "stackoverflow"  # or full path
  path: "datasets/sosum/data"

embedding:
  strategy: "hybrid"  # dense, sparse, or hybrid
  dense:
    provider: "google"
    model: "text-embedding-004"
  sparse:
    provider: "sparse"
    model: "Qdrant/bm25"

qdrant:
  collection: "my_collection"
  host: "localhost"
  port: 6333

Environment Variables

Variable Description Required
GOOGLE_API_KEY Google AI API key Yes
OPENAI_API_KEY OpenAI API key Yes
VOYAGE_API_KEY Voyage AI API key No

πŸ”Œ Extension Points

Add Custom Dataset Adapter

  1. Create adapter class:

    # pipelines/adapters/my_adapter.py
    from pipelines.contracts import BaseAdapter, Document
    
    class MyAdapter(BaseAdapter):
        def load_documents(self) -> List[Document]:
            # Load your data
            return documents
  2. Use in config:

    dataset:
      adapter: "pipelines.adapters.my_adapter.MyAdapter"
      path: "path/to/data"

Add Custom Reranker

Implement in components/rerankers.py or components/advanced_rerankers.py:

from components.rerankers import BaseReranker

class MyReranker(BaseReranker):
    def rerank(self, query: str, results: List[SearchResult]) -> List[SearchResult]:
        # Your reranking logic
        return reranked_results

Add Custom Agent Node

  1. Create node in agent/nodes/:

    from agent.schema import AgentState
    
    def my_node(state: AgentState) -> AgentState:
        # Process state
        return state
  2. Add to graph in agent/graph_refined.py or agent/graph_self_rag.py


🎯 Key Features

Retrieval Strategies

  • Dense Retrieval: Semantic search using embeddings (Google, OpenAI, Voyage, HuggingFace)
  • Sparse Retrieval: BM25-style keyword matching (Qdrant/bm25, SPLADE)
  • Hybrid Retrieval: Combines dense + sparse with RRF (Reciprocal Rank Fusion)

Reranking

  • Cross-Encoder: ms-marco-MiniLM-L-6-v2 (default)
  • Semantic: Sentence transformers for semantic similarity
  • ColBERT: Token-level contextual matching
  • Multi-Stage: Cascading rerankers for efficiency

Agent Modes

  • Standard RAG: Single-pass retrieval β†’ generation
  • Self-RAG: Iterative refinement with hallucination detection and context verification

Benchmarking

  • Metrics: Recall@K, Precision@K, MRR, NDCG@K
  • Optimization: Grid search for hybrid parameters (alpha, RRF-k)
  • LLM-as-Judge: Automated quality evaluation (faithfulness, relevance, helpfulness)
  • Statistical Analysis: Cross-validation, significance testing

πŸ“Š Testing

Run Integration Tests

# Self-RAG integration tests
pytest tests/test_self_rag_integration.py -v

# All tests
pytest tests/ -v

Verify Components

See components/LOGGING_GUIDE.md for how to verify rerankers and filters are working correctly via logs.


πŸ” CLI Tools

Tool Purpose Example
bin/ingest.py Ingest datasets python bin/ingest.py ingest --config my_config.yml
bin/retrieval_pipeline.py Test retrieval python bin/retrieval_pipeline.py --config config.yml --query "test"
bin/qdrant_inspector.py Inspect database python bin/qdrant_inspector.py list
bin/switch_agent_config.py Switch configs python bin/switch_agent_config.py

πŸ“ˆ System Requirements

Minimum:

  • Python 3.11+
  • 8GB RAM
  • 10GB storage

Recommended:

  • 16GB+ RAM
  • SSD storage
  • 4+ CPU cores

πŸ“š Documentation

  • Main README: This file
  • Components: components/README.md - Retrieval pipeline components
  • Pipelines: pipelines/README.md - Data ingestion system
  • Benchmarks: benchmarks/README.md - Evaluation framework
  • Agent: agent/README.md - LangGraph workflows
  • CLI Reference: CLI_REFERENCE.md - Command-line tools
  • Logging Guide: components/LOGGING_GUIDE.md - Verify components work

πŸ› οΈ Technologies

  • LangGraph: Agent workflow orchestration
  • Qdrant: Vector database
  • LangChain: Document processing
  • Sentence Transformers: Embeddings and reranking
  • Pydantic: Data validation

πŸ“§ Contact

Author: Spiros Chatzigeorgiou
Email: spyrchat@ece.auth.gr


Built for production RAG workflows with hybrid retrieval, advanced reranking, and comprehensive evaluation.

About

Production-ready Retrieval-Augmented Generation (RAG) system with hybrid retrieval, Self-RAG agent workflows, cross-encoder reranking, and comprehensive benchmarking.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published