Skip to content

btboilerplate/RAG-Systems-Lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🔎 RAG-Systems-Lab

Python RAG Vector Search BM25 Hybrid Reranker

A modular Retrieval-Augmented Generation (RAG) experimentation framework focused on benchmarking lexical, semantic, hybrid, and reranked retrieval strategies using standard Information Retrieval metrics.

This repository is designed as a foundation for building a future Agentic RAG system, starting with rigorous retrieval evaluation.


📁 Project Structure

RAG-Systems-Lab/

  • data/ — PDF documents used to build the RAG knowledge base
  • assets/ — Evaluation screenshots and retrieval comparisons
  • main.ipynb — Retrieval pipeline and benchmarking logic
  • requirements.txt — Project dependencies
  • README.md — Documentation

🎯 Project Objective

The goal of this project is to:

  • Compare multiple retrieval strategies
  • Evaluate using Recall@1, Recall@5, and MRR
  • Analyze ranking weaknesses
  • Improve top-1 accuracy with reranking
  • Prepare architecture for future agentic extensions

🔍 Retrieval Methods Compared

1️⃣ BM25 (Lexical Search)

Keyword-based ranking.

2️⃣ Vector Retrieval

Dense embedding-based semantic search.

3️⃣ Hybrid Retrieval

BM25 + Vector similarity.

4️⃣ Hybrid + Reranker

Hybrid retrieval followed by neural reranking.


📊 Evaluation Metrics

Each query is mapped to a known correct document chunk.

Metrics computed:

  • Recall@1 → Is the correct chunk ranked first?
  • Recall@5 → Is the correct chunk within top 5?
  • MRR → Measures ranking quality

📈 Experimental Results

🔹 Retrieval Verification

Example retrieval test showing correct chunk detection within top results:

Retrieval Test


🔹 BM25 vs Vector vs Hybrid Metrics

Comparison of Recall@1, Recall@5, and MRR across retrievers:

Metrics Comparison 1


🔹 Additional Query Evaluation

Harder query evaluation demonstrating ranking behavior:

Metrics Comparison 2


🔹 Hybrid vs Hybrid + Reranker

Adding reranking improves Recall@1 and MRR:

Hybrid vs Reranker


🧠 Key Insights

  • Recall@5 alone is insufficient to judge retrieval quality.
  • Vector retrieval significantly improves semantic matching.
  • Hybrid search improves coverage but not always top-rank precision.
  • Reranking meaningfully improves Recall@1.
  • MRR reflects ranking improvements clearly.

▶️ How to Run

1️⃣ Place Your Documents

Put your PDF files inside the data/ folder:

data/ ├── document1.pdf
├── document2.pdf
└── ...


2️⃣ Install Dependencies

Run:

pip install -r requirements.txt

3️⃣ Run the Pipeline

Open:

main.ipynb

Execute all cells sequentially to:

  • Index PDFs
  • Create embeddings
  • Run BM25 / Vector / Hybrid retrieval
  • Evaluate using Recall@1, Recall@5, MRR
  • Compare Hybrid vs Hybrid + Reranker

✅ Expected Output

You will see:

  • Retrieved document chunks
  • Ranking comparisons
  • Metric scores
  • Performance differences across retrievers

🚀 Roadmap (Planned Agentic Extensions)

  • Query rewriting module
  • Multi-hop retrieval
  • Tool-based reasoning
  • Retriever selection agent
  • Self-correcting retrieval loop

This repository is structured to evolve into a fully agentic RAG system.

About

A modular RAG experimentation framework benchmarking BM25, vector, hybrid, and reranked retrieval using Recall@1, Recall@5, and MRR, designed as a foundation for future agentic RAG systems.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors