🚀 Production-grade Retrieval-Augmented Generation (RAG) system powered by multi-agent orchestration, hybrid search, and real-time validation
- 🤖 Multi-Agent Architecture using LangGraph for reasoning workflows
- 🔍 Hybrid Retrieval (Vector + BM25) for high recall
- 🎯 Cross-Encoder Reranking for precision improvement
- 🔁 Self-Healing Loop with validation + query refinement
- 🌐 Web Search Integration for real-time knowledge
- ⚡ Async FastAPI Backend with non-blocking execution
- 💻 Next.js Frontend with live reasoning trace
Traditional RAG systems:
- Retrieve irrelevant chunks
- Hallucinate when context is weak
- Lack validation mechanisms
👉 This system solves:
- Retrieval accuracy
- Context validation
- Multi-step reasoning
User Query
↓
Orchestrator Agent
↓
Retriever Agent (Hybrid Search: Vector + BM25)
↓
Reranker Agent (Cross-Encoder)
↓
Validator Agent (LLM-based evaluation)
├── Invalid → Query Refinement Loop
└── Valid → Summarizer Agent
↓
Final Answer
(Optional)
↓
Web Search Agent (DuckDuckGo)
- Vector DB: ChromaDB
- Embeddings: all-MiniLM-L6-v2
- Keyword Search: BM25
- Chunking: Recursive + hierarchical strategy
- Cross-Encoder:
ms-marco-MiniLM-L-6-v2 - Improves semantic relevance significantly
- LLM-based context validation
- Detects hallucination risk
- Triggers retry loop with refined query
| Agent | Responsibility |
|---|---|
| Orchestrator | Query planning + routing |
| Retriever | Hybrid document retrieval |
| Reranker | Context scoring |
| Validator | Quality + hallucination check |
| Web Search | External knowledge retrieval |
| Summarizer | Final answer generation |
| Layer | Technology |
|---|---|
| Orchestration | LangGraph |
| Backend | FastAPI (Async) |
| Frontend | Next.js + Tailwind |
| Vector DB | ChromaDB |
| Embeddings | HuggingFace |
| Reranker | Cross-Encoder |
| Search | BM25 + DuckDuckGo |
POST /query→ Execute RAG pipelineGET /status/{query_id}→ Real-time progressGET /history/{user_id}→ Query history
👉 Uses async background tasks for scalability
- Real-time Reasoning Trace
- Agent-level visibility
- Dynamic loading states
- Toggle for web search
- Hybrid search improves recall
- Reranking improves precision
- Validation reduces hallucination
- Async execution avoids blocking
Summarize all security audit findings related to OAuth token leakage and suggest mitigations
- Add vector DB scaling (Pinecone / Weaviate)
- Implement semantic caching (Redis)
- Add evaluation metrics (RAGAS / TruLens)
- Streaming responses
- Multi-user session memory
project/
├── backend/
│ ├── agents/
│ ├── scripts/
│ ├── api/
├── frontend/
│ ├── app/
│ ├── components/
Raj Kalash Tiwari GitHub: https://github.com/rjkalash
✅ Advanced RAG system with multi-agent reasoning ⚡ Designed for scalable AI applications
⭐ Star this repo if you found it useful!