Version: 2.0.0
Author: Spiros Chatzigeorgiou
Production-ready Retrieval-Augmented Generation (RAG) system with hybrid retrieval, Self-RAG agent workflows, cross-encoder reranking, and comprehensive benchmarking.
- Python 3.11+
- Docker & Docker Compose
- 16GB+ RAM recommended
- API keys: Google AI, OpenAI (optional: Voyage AI)
# Clone repository
git clone <repository-url>
cd ReRag
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Configure API keys
cp .env_example .env
# Edit .env and add your API keys:
# GOOGLE_API_KEY=your_key_here
# OPENAI_API_KEY=your_key_here# Start Qdrant
docker-compose up -d
# Verify it's running
curl http://localhost:6333/healthz
#You can see the ingestion results in Qdrant's Web UI visiting the link below:
http://localhost:6333/dashboard#/collections#First download the dataset from the scripts folder
# Ingest documents (requires dataset - see Data Ingestion section)
python bin/ingest.py ingest --config pipelines/configs/datasets/stackoverflow_hybrid.yml
# Run agent in interactive mode
python main.py
# Run agent with single query
python main.py --query "What are Python best practices?"
# Run Self-RAG mode (with iterative refinement)
python main.py --mode self-rag --query "Explain how asyncio works"Ingest documents into the vector database:
# Basic ingestion from config
python bin/ingest.py ingest --config pipelines/configs/datasets/stackoverflow_hybrid.yml
# Test with dry run (no upload)
python bin/ingest.py ingest --config my_config.yml --dry-run --max-docs 100
# Check ingestion status
python bin/ingest.py status
# Cleanup canary collections
python bin/ingest.py cleanupConfiguration File Format (pipelines/configs/datasets/*.yml):
dataset:
name: "my_dataset"
adapter: "stackoverflow" # or full path: "pipelines.adapters.custom.MyAdapter"
path: "datasets/sosum/data"
embedding:
strategy: "hybrid" # or "dense" or "sparse"
dense:
provider: "google"
model: "text-embedding-004"
sparse:
provider: "sparse"
model: "Qdrant/bm25"
qdrant:
collection: "my_collection"
host: "localhost"
port: 6333Test retrieval pipelines before using in agents:
# Use any retrieval configuration
python bin/retrieval_pipeline.py \
--config pipelines/configs/retrieval/basic_dense.yml \
--query "How to handle Python exceptions?" \
--top-k 5Run the RAG agent with two available modes:
# Standard RAG mode (single-pass)
python main.py --query "Explain Python decorators"
# Self-RAG mode (iterative refinement with verification)
python main.py --mode self-rag --query "How does asyncio work?"
# Interactive chat
python main.py
# or
python main.py --mode self-ragRun evaluation experiments:
# Run experiment with output directory
python -m benchmarks.experiment1 --output-dir results/exp1
# Run 2D grid optimization for hybrid search parameters
python -m benchmarks.optimize_2d_grid_alpha_rrfk \
--scenario-yaml benchmark_scenarios/your_scenario.yml \
--dataset-path datasets/sosum/data \
--n-folds 5 \
--output-dir results/optimization
# Generate ground truth for evaluation
python -m benchmarks.generate_ground_truth \
--queries-file queries.json \
--output-file ground_truth.jsonSee benchmarks/README.md for detailed documentation.
Modular RAG system with three main subsystems:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAG System β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β π INGESTION β π RETRIEVAL β π€ AGENT β
β β
β Documents Vector Search LangGraph β
β Chunking Reranking Response Gen β
β Embedding Filtering Verification β
β β β β β
β βββββββββββββ Qdrant βββββββββββββ β
β β
β π BENCHMARKS: Evaluation & Optimization β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Component | Purpose | Documentation |
|---|---|---|
| pipelines/ | Data ingestion & processing | README |
| components/ | Retrieval pipeline (filters, rerankers) | README |
| embedding/ | Multi-provider embeddings | README |
| retrievers/ | Dense/sparse/hybrid search | README |
| agent/ | LangGraph workflows (Standard + Self-RAG) | README |
| database/ | Qdrant vector database interface | README |
| benchmarks/ | Evaluation framework | README |
| config/ | Configuration system | - |
# Clone repository
git clone <repository-url>
cd Thesis
# Create virtual environment (Python 3.11+ required)
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt# Create environment file
cp .env_example .envEdit .env and add your API keys:
# Required
GOOGLE_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
# Optional
VOYAGE_API_KEY=your_key_here# Start Qdrant using Docker
docker-compose up -d
# Verify it's running
curl http://localhost:6333/healthThesis/
βββ readme.md # This file
βββ main.py # Agent entry point (Standard & Self-RAG modes)
βββ config.yml # Main configuration file
βββ docker-compose.yml # Qdrant database setup
βββ requirements.txt # Python dependencies
β
βββ agent/ # LangGraph agent workflows
β βββ graph_refined.py # Standard RAG workflow
β βββ graph_self_rag.py # Self-RAG workflow (iterative refinement)
β βββ schema.py # State definitions
β βββ nodes/ # Agent nodes (retriever, generator, grader)
β
βββ pipelines/ # Data ingestion
β βββ adapters/ # Dataset adapters (StackOverflow, custom)
β βββ ingest/ # Ingestion pipeline core
β βββ eval/ # Retrieval evaluator
β βββ configs/ # Dataset configurations
β βββ datasets/ # Per-dataset configs
β
βββ components/ # Retrieval pipeline components
β βββ retrieval_pipeline.py # Pipeline orchestration
β βββ rerankers.py # CrossEncoder, Semantic, ColBERT, MultiStage
β βββ filters.py # Tag, duplicate, relevance filters
β βββ post_processors.py # Result enhancement & limiting
β
βββ retrievers/ # Core retrieval implementations
β βββ dense_retriever.py # Dense/sparse/hybrid retrieval
β βββ base.py # Abstract interfaces
β
βββ embedding/ # Embedding providers
β βββ factory.py # Provider factory
β βββ providers/ # Google, OpenAI, Voyage, HuggingFace
β βββ base_embedder.py # Abstract interfaces
β
βββ database/ # Vector database
β βββ qdrant_controller.py # Qdrant integration
β βββ base.py # Abstract interfaces
β
βββ config/ # Configuration system
β βββ config_loader.py # YAML config loader
β βββ llm_factory.py # LLM provider factory
β
βββ benchmarks/ # Evaluation framework
β βββ experiment1.py # Main experiment runner
β βββ optimize_2d_grid_alpha_rrfk.py # Grid search optimization
β βββ llm_as_judge_eval.py # LLM-based evaluation
β βββ generate_ground_truth.py # Ground truth generation
β βββ benchmarks_runner.py # Core benchmark runner
β βββ benchmarks_metrics.py # Metrics (Recall, Precision, MRR, NDCG)
β βββ report_generator.py # Report generation (used by experiments)
β βββ statistical_analyzer.py # Statistical analysis
β
βββ bin/ # CLI tools
β βββ ingest.py # Ingestion CLI
β βββ retrieval_pipeline.py # Retrieval testing CLI
β βββ qdrant_inspector.py # Database inspection
β βββ switch_agent_config.py # Config switcher
β
βββ logs/ # Application logs
β βββ agent.log # Main agent log
β βββ ingestion.log # Ingestion log
β βββ utils/logger.py # Custom logger
β
βββ tests/ # Test suite
βββ test_self_rag_integration.py # Self-RAG integration tests
βββ [other test files]
Main Config (config.yml):
- System-wide settings
- Loaded by
config/config_loader.py
Pipeline Configs (pipelines/configs/):
datasets/- Dataset-specific configs (ingestion)retrieval/- Retrieval pipeline configs
Example: Ingestion Config
dataset:
name: "stackoverflow"
adapter: "stackoverflow" # or full path
path: "datasets/sosum/data"
embedding:
strategy: "hybrid" # dense, sparse, or hybrid
dense:
provider: "google"
model: "text-embedding-004"
sparse:
provider: "sparse"
model: "Qdrant/bm25"
qdrant:
collection: "my_collection"
host: "localhost"
port: 6333| Variable | Description | Required |
|---|---|---|
GOOGLE_API_KEY |
Google AI API key | Yes |
OPENAI_API_KEY |
OpenAI API key | Yes |
VOYAGE_API_KEY |
Voyage AI API key | No |
-
Create adapter class:
# pipelines/adapters/my_adapter.py from pipelines.contracts import BaseAdapter, Document class MyAdapter(BaseAdapter): def load_documents(self) -> List[Document]: # Load your data return documents
-
Use in config:
dataset: adapter: "pipelines.adapters.my_adapter.MyAdapter" path: "path/to/data"
Implement in components/rerankers.py or components/advanced_rerankers.py:
from components.rerankers import BaseReranker
class MyReranker(BaseReranker):
def rerank(self, query: str, results: List[SearchResult]) -> List[SearchResult]:
# Your reranking logic
return reranked_results-
Create node in
agent/nodes/:from agent.schema import AgentState def my_node(state: AgentState) -> AgentState: # Process state return state
-
Add to graph in
agent/graph_refined.pyoragent/graph_self_rag.py
- Dense Retrieval: Semantic search using embeddings (Google, OpenAI, Voyage, HuggingFace)
- Sparse Retrieval: BM25-style keyword matching (Qdrant/bm25, SPLADE)
- Hybrid Retrieval: Combines dense + sparse with RRF (Reciprocal Rank Fusion)
- Cross-Encoder: ms-marco-MiniLM-L-6-v2 (default)
- Semantic: Sentence transformers for semantic similarity
- ColBERT: Token-level contextual matching
- Multi-Stage: Cascading rerankers for efficiency
- Standard RAG: Single-pass retrieval β generation
- Self-RAG: Iterative refinement with hallucination detection and context verification
- Metrics: Recall@K, Precision@K, MRR, NDCG@K
- Optimization: Grid search for hybrid parameters (alpha, RRF-k)
- LLM-as-Judge: Automated quality evaluation (faithfulness, relevance, helpfulness)
- Statistical Analysis: Cross-validation, significance testing
# Self-RAG integration tests
pytest tests/test_self_rag_integration.py -v
# All tests
pytest tests/ -vSee components/LOGGING_GUIDE.md for how to verify rerankers and filters are working correctly via logs.
| Tool | Purpose | Example |
|---|---|---|
bin/ingest.py |
Ingest datasets | python bin/ingest.py ingest --config my_config.yml |
bin/retrieval_pipeline.py |
Test retrieval | python bin/retrieval_pipeline.py --config config.yml --query "test" |
bin/qdrant_inspector.py |
Inspect database | python bin/qdrant_inspector.py list |
bin/switch_agent_config.py |
Switch configs | python bin/switch_agent_config.py |
Minimum:
- Python 3.11+
- 8GB RAM
- 10GB storage
Recommended:
- 16GB+ RAM
- SSD storage
- 4+ CPU cores
- Main README: This file
- Components:
components/README.md- Retrieval pipeline components - Pipelines:
pipelines/README.md- Data ingestion system - Benchmarks:
benchmarks/README.md- Evaluation framework - Agent:
agent/README.md- LangGraph workflows - CLI Reference:
CLI_REFERENCE.md- Command-line tools - Logging Guide:
components/LOGGING_GUIDE.md- Verify components work
- LangGraph: Agent workflow orchestration
- Qdrant: Vector database
- LangChain: Document processing
- Sentence Transformers: Embeddings and reranking
- Pydantic: Data validation
Author: Spiros Chatzigeorgiou
Email: spyrchat@ece.auth.gr
Built for production RAG workflows with hybrid retrieval, advanced reranking, and comprehensive evaluation.