Local RAG Runtime

Private Document Intelligence + Grounded LLM Inference (Fully Local)

A production-grade Retrieval Augmented Generation (RAG) runtime that enables private, offline querying of local documents using open‑source LLMs.

This system ingests PDFs, text files, and research documents, builds vector embeddings, and provides grounded answers via both CLI and Web UI interfaces.

🚀 Key Features

• 100% Local — No cloud dependencies
• Private document ingestion
• Vector similarity retrieval
• Citation-aware responses
• Grounded LLM inference
• Web-based chat interface
• Windows / Linux compatible

🧠 Architecture

User Query
↓
Retriever (Chroma Vector DB)
↓
Relevant Context Chunks
↓
Prompt Grounding Layer
↓
LLM Inference (Ollama / Mistral)
↓
Answer + Sources

📂 Repository Structure

local-rag-runtime/ │ ├── ingest.py # Document ingestion pipeline
├── chat.py # CLI chat interface
├── rag_engine.py # Retrieval + grounding logic
├── webui.py # Gradio browser UI
│ ├── vector_db/ # Embedding storage
├── data/ # Source documents
│ ├── requirements.txt
└── README.md

⚙️ Installation

1️⃣ Clone Repo

git clone https://github.com/manishklach/local-rag-runtime.git
cd local-rag-runtime

2️⃣ Create Virtual Environment

Windows:

python -m venv venv
venv\Scripts\activate

Linux / Mac:

python3 -m venv venv
source venv/bin/activate

3️⃣ Install Dependencies

pip install -r requirements.txt

Manual install (if needed):

pip install langchain
pip install langchain-chroma
pip install langchain-huggingface
pip install sentence-transformers
pip install gradio
pip install chromadb
pip install requests

🤖 Install Local LLM (Ollama)

Download: https://ollama.com

Pull model:

ollama pull mistral

Start runtime:

ollama serve

📥 Document Ingestion

Place files inside:

data/

Run ingestion:

python ingest.py

This will:

• Split documents into chunks
• Generate embeddings
• Store vectors in Chroma DB

💬 CLI Chat

python chat.py

Example queries:

Explain the snapshot pipeline.
How does UVM paging work?
Describe GPU suspend lifecycle.

🌐 Web UI Chat

Launch browser interface:

python webui.py

Open:

http://127.0.0.1:7860

Features:

• Chat interface
• Grounded responses
• Source attribution

📊 Retrieval Quality

Current retrieval stack includes:

• Sentence‑Transformer embeddings
• Top‑K similarity search
• Context concatenation
• Prompt grounding
• Source tracking

🔒 Privacy Model

All processing occurs locally:

• Documents never leave machine
• No external LLM APIs
• Air‑gapped capable

🏷️ Releases

Version	Features
v0.1	CLI RAG pipeline
v0.2	Web UI + citation retrieval
v0.3	Streaming + chat memory (planned)
v1.0	Enterprise runtime

🛠️ Roadmap

Planned upgrades:

• Inline citation highlighting
• Chunk scoring visualization
• Streaming token responses
• Multi-model switching
• Desktop packaging
• Kubernetes deployment

🔬 Example Use Cases

• Patent querying
• Research summarization
• Architecture review
• Codebase knowledge search
• Offline enterprise AI

👤 Author

Manish Keshav Lachwani
AI Infrastructure • GPU Runtime Systems • Memory Orchestration • RAG Architectures

GitHub: https://github.com/manishklach

⭐ Acknowledgements

Built on:

• LangChain
• ChromaDB
• Sentence Transformers
• Ollama
• Mistral LLM
• Gradio

🚀 Quick Start

ollama serve
ollama pull mistral
python ingest.py
python webui.py

Open browser → Ask questions → Get grounded answers.

Private AI. Local Intelligence. Zero Cloud.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local RAG Runtime

Private Document Intelligence + Grounded LLM Inference (Fully Local)

🚀 Key Features

🧠 Architecture

📂 Repository Structure

⚙️ Installation

1️⃣ Clone Repo

2️⃣ Create Virtual Environment

3️⃣ Install Dependencies

🤖 Install Local LLM (Ollama)

📥 Document Ingestion

💬 CLI Chat

🌐 Web UI Chat

📊 Retrieval Quality

🔒 Privacy Model

🏷️ Releases

🛠️ Roadmap

🔬 Example Use Cases

👤 Author

⭐ Acknowledgements

🚀 Quick Start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
.gitignore		.gitignore
README.md		README.md
chat.py		chat.py
ingest.py		ingest.py
rag_engine.py		rag_engine.py
requirements.txt		requirements.txt
webui.py		webui.py

Folders and files

Latest commit

History

Repository files navigation

Local RAG Runtime

Private Document Intelligence + Grounded LLM Inference (Fully Local)

🚀 Key Features

🧠 Architecture

📂 Repository Structure

⚙️ Installation

1️⃣ Clone Repo

2️⃣ Create Virtual Environment

3️⃣ Install Dependencies

🤖 Install Local LLM (Ollama)

📥 Document Ingestion

💬 CLI Chat

🌐 Web UI Chat

📊 Retrieval Quality

🔒 Privacy Model

🏷️ Releases

🛠️ Roadmap

🔬 Example Use Cases

👤 Author

⭐ Acknowledgements

🚀 Quick Start

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages