A modular Retrieval-Augmented Generation (RAG) experimentation framework focused on benchmarking lexical, semantic, hybrid, and reranked retrieval strategies using standard Information Retrieval metrics.
This repository is designed as a foundation for building a future Agentic RAG system, starting with rigorous retrieval evaluation.
RAG-Systems-Lab/
- data/ — PDF documents used to build the RAG knowledge base
- assets/ — Evaluation screenshots and retrieval comparisons
- main.ipynb — Retrieval pipeline and benchmarking logic
- requirements.txt — Project dependencies
- README.md — Documentation
The goal of this project is to:
- Compare multiple retrieval strategies
- Evaluate using Recall@1, Recall@5, and MRR
- Analyze ranking weaknesses
- Improve top-1 accuracy with reranking
- Prepare architecture for future agentic extensions
Keyword-based ranking.
Dense embedding-based semantic search.
BM25 + Vector similarity.
Hybrid retrieval followed by neural reranking.
Each query is mapped to a known correct document chunk.
Metrics computed:
- Recall@1 → Is the correct chunk ranked first?
- Recall@5 → Is the correct chunk within top 5?
- MRR → Measures ranking quality
Example retrieval test showing correct chunk detection within top results:
Comparison of Recall@1, Recall@5, and MRR across retrievers:
Harder query evaluation demonstrating ranking behavior:
Adding reranking improves Recall@1 and MRR:
- Recall@5 alone is insufficient to judge retrieval quality.
- Vector retrieval significantly improves semantic matching.
- Hybrid search improves coverage but not always top-rank precision.
- Reranking meaningfully improves Recall@1.
- MRR reflects ranking improvements clearly.
Put your PDF files inside the data/ folder:
data/
├── document1.pdf
├── document2.pdf
└── ...
Run:
pip install -r requirements.txt
Open:
main.ipynb
Execute all cells sequentially to:
- Index PDFs
- Create embeddings
- Run BM25 / Vector / Hybrid retrieval
- Evaluate using Recall@1, Recall@5, MRR
- Compare Hybrid vs Hybrid + Reranker
You will see:
- Retrieved document chunks
- Ranking comparisons
- Metric scores
- Performance differences across retrievers
- Query rewriting module
- Multi-hop retrieval
- Tool-based reasoning
- Retriever selection agent
- Self-correcting retrieval loop
This repository is structured to evolve into a fully agentic RAG system.



