Where Justice Meets Awareness
An AI-powered legal assistant for Indian citizens, practising lawyers, law firms, and corporate legal advisors. Neethi provides accurate, grounded guidance across Indian criminal law (BNS, BNSS, BSA), nineteen civil and special acts, constitutional provisions, and ~450,000 Supreme Court judgments — with deterministic safeguards against the section-number errors introduced by India's July 2024 criminal law transition.
| Milestone | Status |
|---|---|
| Data pipeline (1,059 sections + 1,220 mappings) | COMPLETE |
| Core tools (StatuteMapper, Guardrails, RepealHandler) | COMPLETE |
| RAG pipeline — criminal statutes (3,143 vectors) | COMPLETE |
| RAG pipeline — 19 civil acts (4,552 vectors) | COMPLETE |
| RAG pipeline — SC judgments (~452K vectors) | COMPLETE |
| RAG pipeline — bail judgments (3,600 vectors) | COMPLETE |
| Hybrid search (dense + BM25 RRF) | COMPLETE |
| Agentic LLM (two-turn tool-calling via GROQ) | COMPLETE |
| Multi-domain routing (criminal / civil / constitutional / overlap) | COMPLETE |
| FastAPI + JWT auth + rate limiting | COMPLETE |
| Safety guard (PII masking + injection check) | COMPLETE |
| Intent classifier + query clarification | COMPLETE |
| React frontend | COMPLETE |
| Test suite | 147 / 147 passing |
| Document Drafter agent (FIR/RTI) | TODO — Phase 4 |
| Resource Locator agent (courts/police) | TODO — Phase 4 |
| Supabase integration (persistent user DB) | TODO — Phase 4 |
User Query (POST /api/chat)
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LAYER 1: API (FastAPI + JWT + CORS + slowapi) │
│ Rate limit: 30 req/min per client │
└───────────────────────┬─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LAYER 2: PRE-PROCESSING (Deterministic, minimal LLM) │
│ │
│ ① SafetyGuard │
│ • PII masking (Aadhaar, phone, email) │
│ • Prompt injection detection │
│ │
│ ② IntentClassifier │
│ • Classify: legal_provision / punishment / case_search │
│ • document_draft / resource_locate / general │
│ • Vague query detection → ASK_CLARIFICATION │
│ • Short-circuit on clarification (zero LLM cost) │
│ │
│ ③ Guardrails.validate_query() │
│ • Domain routing: CRIMINAL / CIVIL / CONSTITUTIONAL / │
│ OVERLAP / STATIC_REFERRAL │
│ • Entity extraction: section numbers, law codes │
│ • On REJECT → return rejection message immediately │
│ │
│ ④ StatuteMapper.map_query() │
│ • Detects old-law references in query │
│ • check_false_friend() FIRST (e.g. IPC 302 ≠ BNS 302) │
│ • Maps: IPC→BNS | CrPC→BNSS | IEA→BSA │
│ • Returns: [(original_text, law_code, MappingResult)] │
│ │
│ ⑤ RepealHandler (triggered if mapping = REPEALED) │
└───────────────────────┬─────────────────────────────────────┘
│ ValidationResult + MappingResults
▼
┌─────────────────────────────────────────────────────────────┐
│ LAYER 3: ORCHESTRATOR (Routes by intent + domain) │
│ │
│ Criminal domain → LegalAdvisorAgent (criminal prompt) │
│ Civil domain → LegalAdvisorAgent (civil advisor prompt)│
│ Constitutional → LegalAdvisorAgent (constitutional) │
│ Overlap domain → LegalAdvisorAgent (combined context) │
│ case_search → PrecedentResearcherAgent │
└───────────────────────┬─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LAYER 4: AGENTIC TOOL-CALLING (Two-turn LLM loop) │
│ │
│ Turn 1: GROQ (llama-3.3-70b-versatile) + function calling │
│ • LLM decides which tools to call │
│ • Tools: search_statutes | search_civil_acts | │
│ search_case_law | search_bail_judgments │
│ • civil_law_hint: filters civil acts search to act │
│ │
│ Tool execution (Hybrid RRF Search): │
│ Dense (all-MiniLM-L6-v2 384d) + BM25 text merged via RRF │
│ search_statutes → neethi_statutes (3,143 vectors) │
│ search_civil_acts → neethi_civil_acts (4,552 vectors) │
│ search_case_law → indian_kanoon_sc_judgments (~452K)│
│ search_bail_judgments→ neethi_bail_judgments (3,600) │
│ │
│ Turn 2: GROQ synthesises answer from tool results │
│ • Only cites sections present in retrieved chunks │
│ • Appends standard legal disclaimer │
└───────────────────────┬─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LAYER 5: VERIFICATION │
│ │
│ ChunkVerifier (llama-3.1-8b-instant) │
│ • Grades each retrieved chunk for relevance │
│ GroundingCheck (llama-3.1-8b-instant) │
│ • Validates answer is grounded in retrieved context │
│ GatekeeperNode (llama-3.1-8b-instant) │
│ • Catches LLM answer that addresses wrong topic │
└───────────────────────┬─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LAYER 6: RESPONSE │
│ { │
│ "response": "...", │
│ "metadata": { │
│ "domain": "CRIMINAL", │
│ "intent": "punishment", │
│ "agent_used": "legal_advisor", │
│ "section_mappings": [...], │
│ "rag_hits": {"statutes": 7, "civil_acts": 3}, │
│ "warnings": ["False friend: IPC 302 → BNS 103"], │
│ "source_citations": [...], │
│ "response_time_ms": 2340 │
│ }, │
│ "disclaimer": "..." │
│ } │
└─────────────────────────────────────────────────────────────┘
| Domain | Supported Acts | Route |
|---|---|---|
| Criminal | BNS, BNSS, BSA (+ IPC/CrPC/IEA pre-2024) | CRIMINAL_PIPELINE |
| Civil | Consumer Protection, Contract Act, Transfer of Property, CPC, Specific Relief, Limitation, Arbitration | CIVIL_PIPELINE |
| IT / Data | IT Act 2000, DPDP Act 2023 | CIVIL_PIPELINE |
| Family / Gender | PWDVA 2005, HMA 1955, SC/ST PoA 1989, POSH 2013 | CIVIL_PIPELINE |
| Financial | Negotiable Instruments Act, IBC 2016 | CIVIL_PIPELINE |
| Public Law | RTI Act 2005 | CIVIL_PIPELINE |
| Constitutional | Constitution of India (Fundamental Rights, Writs) | CONSTITUTIONAL |
| Overlap | Cyber fraud (BNS + IT Act), Domestic violence (BNS + PWDVA), Workplace harassment (BNS + POSH) | OVERLAP_PIPELINE |
Out-of-scope (graceful redirect): Pure company incorporation, SEBI filings, GST/income tax compliance.
| Layer | Technology | Notes |
|---|---|---|
| API Framework | FastAPI 0.115 + uvicorn | Pydantic v2 models |
| Auth | PyJWT + bcrypt | In-memory store; swap Supabase in Phase 4 |
| Rate Limiting | slowapi | 30 req/min on chat; per-user-id |
| LLM (primary) | GROQ llama-3.3-70b-versatile |
Free tier, 100K TPD, function calling |
| LLM (grading) | GROQ llama-3.1-8b-instant |
ChunkVerifier, GroundingCheck, Gatekeeper |
| LLM (fallback) | Anthropic Claude | When GROQ not configured |
| Embeddings | all-MiniLM-L6-v2 |
384d, CPU-only, ~90 MB |
| Vector DB | Qdrant Cloud (free tier) | 1 GB RAM, 4 GB disk |
| Search | Hybrid RRF (dense + BM25) | HYBRID_SEARCH=true by default |
| PDF Parsing | PyMuPDF (fitz) | BPRD handbook + 19 civil act PDFs |
| Testing | pytest 7.4 | 147/147 tests passing |
| Database | Supabase PostgreSQL | Schema designed; not yet wired |
| Frontend | React.js + Vite + Tailwind | Phase 5 complete |
| Collection | Vectors | Content |
|---|---|---|
neethi_statutes |
3,143 | BNS/BNSS/BSA + Constitution of India |
neethi_civil_acts |
4,552 | 19 civil & special acts |
indian_kanoon_sc_judgments |
~452,000 | Supreme Court judgments 1950–2024 |
neethi_bail_judgments |
3,600 | 1,200 annotated bail orders × 3 chunks |
| Code | Act | Vectors |
|---|---|---|
| ITA2000 | Information Technology Act 2000 | 205 |
| DPDP2023 | Digital Personal Data Protection Act 2023 | 107 |
| PWDVA2005 | Protection of Women from Domestic Violence Act | 59 |
| HMA1955 | Hindu Marriage Act 1955 | 72 |
| HSA1956 | Hindu Succession Act 1956 | 66 |
| SMA1954 | Special Marriage Act 1954 | 105 |
| TPA1882 | Transfer of Property Act 1882 | 254 |
| ICA1872 | Indian Contract Act 1872 | 276 |
| CPA2019 | Consumer Protection Act 2019 | 190 |
| SCST_POA1989 | SC/ST Prevention of Atrocities Act | 73 |
| POSH2013 | Sexual Harassment at Workplace Act | 60 |
| NIA1881 | Negotiable Instruments Act 1881 | 159 |
| IBC2016 | Insolvency and Bankruptcy Code 2016 | 606 |
| RTI2005 | Right to Information Act 2005 | 112 |
| CPC1908 | Code of Civil Procedure 1908 | 1,719 |
| LIMA1963 | Limitation Act 1963 | 103 |
| SRA1963 | Specific Relief Act 1963 | 76 |
| ARBI1996 | Arbitration and Conciliation Act 1996 | 286 |
| DPA1961 | Dowry Prohibition Act 1961 | 24 |
| File | Sections | Size |
|---|---|---|
bns_complete.json |
358 | 416 KB |
bnss_complete.json |
531 | 577 KB |
bsa_complete.json |
170 | 178 KB |
| File | Mappings |
|---|---|
ipc_to_bns.json |
525 |
crpc_to_bnss.json |
513 |
iea_to_bsa.json |
182 |
Total: 1,220 bidirectional mappings.
The same number means completely different things in old vs. new law. Getting this wrong causes catastrophic legal errors.
| Old Law | Old Section | Meaning | WRONG New | CORRECT New |
|---|---|---|---|---|
| CrPC | 438 | Anticipatory Bail | BNSS 438 (Revision) | BNSS 482 |
| CrPC | 439 | Bail non-bailable | BNSS 439 (Transfer) | BNSS 483 |
| IPC | 302 | Murder | BNS 302 (Stolen property) | BNS 103 |
| IPC | 420 | Cheating | BNS 420 (Personation) | BNS 318(4) |
| IPC | 379 | Theft | BNS 379 (Misappropriation) | BNS 303 |
StatuteMapper detects these false friends before any LLM call and injects a warning into the prompt context. It is called on every query that contains a section reference — no exceptions.
Process a legal query through the full pipeline.
Request:
{
"message": "What is the punishment for murder under BNS?",
"session_id": "optional-uuid",
"language": "en"
}Response:
{
"response": "Under BNS Section 103, murder is punishable by...",
"metadata": {
"domain": "CRIMINAL",
"intent": "punishment",
"confidence": 0.95,
"agent_used": "legal_advisor",
"section_mappings": [],
"warnings": [],
"entities": { "sections": ["103"], "laws": ["BNS"] },
"source_citations": ["BNS Section 103 - Murder"],
"rag_hits": { "statutes": 7, "cases": 2 },
"response_time_ms": 2310
},
"disclaimer": "This information is for general awareness only..."
}Rate limit: 30 requests/minute per client IP.
Returns {"status": "healthy"} with uptime info.
Pings all Qdrant collections. Returns {"ready": true} only when Qdrant is reachable.
Register a new user. Returns JWT token.
Authenticate and receive JWT token.
Return current user info (requires Authorization: Bearer <token>).
Neethi/
├── CLAUDE.md ← Development instructions + agent protocol
├── README.md
├── requirements.txt
├── .env.example
│
├── src/ ← Application source
│ ├── agents/
│ │ └── orchestrator.py ← Intent router + agentic loop + GROQ function calling
│ ├── api/
│ │ ├── app.py ← FastAPI factory, CORS, lifespan
│ │ ├── middleware/
│ │ │ ├── auth.py ← PyJWT + bcrypt, inject_user_state()
│ │ │ └── rate_limit.py ← slowapi singleton limiter
│ │ └── routes/
│ │ ├── auth.py ← /api/auth/register, login, me
│ │ ├── chat.py ← POST /api/chat (main endpoint)
│ │ └── health.py ← GET /api/health, /api/ready
│ ├── config/
│ │ └── settings.py ← 30+ env vars (dotenv + pydantic-settings)
│ ├── intent/
│ │ └── classifier.py ← Intent classification + vague query detection
│ ├── rag/
│ │ ├── embedder.py ← all-MiniLM-L6-v2 singleton, batch support
│ │ └── chunking_strategy.py ← Semantic legal text chunking
│ ├── safety/
│ │ └── safety_guard.py ← PII masking + prompt injection detection
│ ├── services/
│ │ └── thesys_service.py ← Thesys UI streaming service
│ ├── tools/
│ │ ├── statute_mapper.py ← CRITICAL: false-friend detection + mapping
│ │ ├── guardrails.py ← Multi-domain routing + intent classification
│ │ ├── rag_tool.py ← Quad-collection hybrid Qdrant search
│ │ ├── repeal_handler.py ← Repealed/omitted provision handling
│ │ └── find_legal_resources.py ← Resource locator tool
│ ├── utils/
│ │ └── logger.py
│ └── verification/
│ └── chunk_verifier.py ← Stage 2 grading (llama-3.1-8b)
│
├── data/
│ ├── statutes/ ← 1,059 sections with full legal text
│ │ ├── bns_complete.json
│ │ ├── bnss_complete.json
│ │ └── bsa_complete.json
│ ├── mappings/ ← 1,220 bidirectional mappings
│ │ ├── ipc_to_bns.json
│ │ ├── crpc_to_bnss.json ← Contains section_438_trap (DO NOT REMOVE)
│ │ └── iea_to_bsa.json
│ └── raw/
│ ├── bprd_bns_handbook.pdf
│ ├── bprd_bnss_handbook.pdf
│ ├── bprd_bsa_handbook.pdf
│ ├── constitution_of_india.json
│ ├── indian_bail_judgments.json
│ └── indiacode_pdfs/ ← 19 civil act PDFs (source data)
│
├── docs/
│ ├── architecture/
│ │ ├── system_architecture.md
│ │ ├── agentic_workflow.md
│ │ └── database_schema.md
│ ├── agents/ ← 7 development companion agent prompts
│ ├── guides/ ← Developer how-to guides
│ └── reports/ ← Audit and phase reports
│
├── tests/
│ ├── conftest.py
│ ├── test_statute_mapper.py ← 34 tests (section number trap guards)
│ ├── test_guardrails.py ← 24 tests
│ ├── test_guardrails_routing.py ← 42 tests (multi-domain routing)
│ ├── test_intent_classifier.py ← 22 tests
│ ├── test_safety_guard.py ← 21 tests
│ ├── test_auth.py ← 13 JWT auth tests
│ ├── test_rate_limit.py ← 4 rate limit tests
│ └── test_resource_tool.py
│
├── scripts/
│ ├── ingest_to_qdrant_production.py ← Ingest BNS/BNSS/BSA → Qdrant
│ ├── ingest_legal_acts.py ← Ingest 19 civil acts → Qdrant
│ ├── ingest_bail_judgments.py ← Ingest bail orders → Qdrant
│ ├── ingest_constitution.py ← Ingest Constitution → Qdrant
│ ├── setup_fulltext_indexes.py ← Create BM25 + keyword indexes (run once)
│ ├── KAGGLE_PRODUCTION_FINAL.py ← Indian Kanoon judgment ingestion
│ ├── enhance_bns_legal_text.py
│ ├── enhance_bnss_legal_text.py
│ ├── enhance_bsa_legal_text.py
│ ├── parse_bprd_handbooks.py
│ ├── evaluate_indiclegalqa.py
│ └── validate_data_quality.py
│
├── neethi-frontend/ ← React frontend (Phase 5)
│ ├── src/
│ │ ├── components/ ← ChatWindow, ChatInput, MessageBubble, etc.
│ │ ├── pages/ ← ChatPage, UsageDashboard
│ │ ├── hooks/
│ │ ├── services/
│ │ └── store/
│ ├── index.html
│ ├── vite.config.js
│ └── package.json
│
└── Documents/ ← Academic papers and design documents
- Python 3.9+
- A GROQ API key (free at console.groq.com)
- Qdrant Cloud account (free at cloud.qdrant.io)
# 1. Clone and create virtual environment
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/macOS
# 2. Install dependencies
pip install -r requirements.txt
# 3. Configure environment
copy .env.example .env
# Edit .env — required variables:
# GROQ_API_KEY=gsk_...
# QDRANT_URL=https://xxx.qdrant.io
# QDRANT_API_KEY=...
# JWT_SECRET_KEY=<any-long-random-string>
# Optional:
# ANTHROPIC_API_KEY=sk-ant-... (LLM fallback)
# HYBRID_SEARCH=true (BM25 + dense, default: true)
# SUPABASE_URL=... (persistent DB — Phase 4)
# SUPABASE_KEY=...
# 4. Run tests
pytest tests/ -v
# Expected: 147 passed
# 5. Start the server
python -m uvicorn src.api.app:app --port 8002 --reload
# API docs: http://localhost:8002/docs# Criminal law statutes (BNS/BNSS/BSA + Constitution)
python scripts/ingest_to_qdrant_production.py
# 19 civil and special acts
python scripts/ingest_legal_acts.py
# Annotated bail judgments
python scripts/ingest_bail_judgments.py
# Create BM25 full-text + keyword indexes (required for hybrid search)
python scripts/setup_fulltext_indexes.py
# Indian Kanoon Supreme Court judgments (requires Kaggle credentials)
# See docs/guides/kaggle_setup.md
python scripts/KAGGLE_PRODUCTION_FINAL.pycd neethi-frontend
npm install
npm run dev # dev server at http://localhost:5173
npm run build # production build# Full suite
pytest tests/ -v
# Specific suites
pytest tests/test_statute_mapper.py -v # 34 tests — section number trap
pytest tests/test_guardrails.py -v # 24 tests
pytest tests/test_guardrails_routing.py -v # 42 tests — multi-domain routing
pytest tests/test_intent_classifier.py -v # 22 tests
pytest tests/test_safety_guard.py -v # 21 tests
pytest tests/test_auth.py -v # 13 JWT auth tests
pytest tests/test_rate_limit.py -v # 4 rate limit tests
# With coverage report
pytest tests/ --cov=src --cov-report=term-missing| Period | Applicable Law |
|---|---|
| Before July 1, 2024 | IPC + CrPC + Indian Evidence Act |
| After July 1, 2024 | BNS + BNSS + BSA |
| Pending cases | Mixed (depends on date of offence) |
Dense vector similarity (all-MiniLM-L6-v2) and BM25 keyword matching are merged via Reciprocal Rank Fusion. An RRF score of 1.0 means a chunk ranked #1 in both dense and text search — these are highly reliable retrievals. Gracefully falls back to dense-only if text indexes are absent.
all-MiniLM-L6-v2 can semantically drift on procedural queries (e.g., "maximum remand period" matches limitation period, not remand). The _CONCEPT_HINTS dictionary in orchestrator.py pins natural-language concepts to canonical sections, bypassing the threshold and prepending them to retrieval results.
| Component | Status | Detail |
|---|---|---|
| StatuteMapper | Production | 5 false-friend pairs, 1,220 mappings |
| Guardrails | Production | Multi-domain routing + intent + entity extraction |
| RepealHandler | Production | REPEALED / OMITTED / DELETED status handling |
| SafetyGuard | Production | PII masking, prompt injection detection |
| IntentClassifier | Production | 7 intents + ASK_CLARIFICATION, vague query detection |
| Embedder | Production | all-MiniLM-L6-v2, singleton, batch, CPU-only |
| Chunking | Production | Semantic legal boundaries, sub-section aware |
| RAG Tool | Production | Quad-collection hybrid search (RRF) |
| LegalAdvisorAgent | Production | Multi-domain prompts, concept pinning |
| PrecedentResearcher | Production | SC case law + bail judgment search |
| ChunkVerifier | Production | Stage 2 grading via llama-3.1-8b |
| GroundingCheck | Production | Answer grounding validation |
| GatekeeperNode | Production | Topic alignment guard |
| Orchestrator | Production | Intent routing, GROQ function calling, Anthropic fallback |
| FastAPI + routes | Production | /chat, /health, /ready, /auth/* |
| JWT Auth | Production | Register / login / me |
| Rate Limiting | Production | slowapi, 30 req/min on chat |
| Criminal law data | Production | 1,059 sections, full legal_text |
| Mapping JSONs | Production | 1,220 IPC/CrPC/IEA → BNS/BNSS/BSA |
| Civil acts data | Production | 19 acts, 4,552 vectors in Qdrant |
| SC Judgments | Production | ~452K judgment chunks in Qdrant |
| Bail Judgments | Production | 1,200 annotated orders, 3,600 vectors |
| React Frontend | Production | Chat UI, history sidebar, usage dashboard |
| Task | Priority |
|---|---|
| Supabase integration (persistent users/sessions) | High |
| Document Drafter agent (FIR, RTI, bail applications) | High |
| Resource Locator agent (courts, police stations by location) | Medium |
| Session persistence (multi-turn memory) | Medium |
| Multilingual support (Hindi) | Medium |
| Task | Description |
|---|---|
| Cloudflare CDN + WAF | Production hosting |
| Query analytics dashboard | Section-level usage tracking |
| Gazette tracking | Detect future BNS/BNSS/BSA amendments |
| CI/CD pipeline | Automated test + deploy |
| IPC | BNS | Offence |
|---|---|---|
| 302 | 103 | Murder |
| 304 | 105 | Culpable Homicide |
| 376 | 63–69 | Sexual Offences |
| 420 | 318(4) | Cheating |
| 379 | 303 | Theft |
| 384 | 308 | Extortion |
| 392 | 309 | Robbery |
| 498A | 85 | Cruelty by Husband |
| CrPC | BNSS | Procedure |
|---|---|---|
| 41 | 35 | Arrest without warrant |
| 154 | 173 | FIR registration |
| 161 | 180 | Examination of witnesses |
| 164 | 183 | Recording confessions |
| 167 | 187 | Remand |
| 438 | 482 | Anticipatory bail |
| 439 | 483 | Bail in non-bailable offences |
| Document | Path |
|---|---|
| System Architecture | docs/architecture/system_architecture.md |
| Agentic Workflow | docs/architecture/agentic_workflow.md |
| Database Schema | docs/architecture/database_schema.md |
| RAG Ingestion Guide | docs/guides/rag_ingestion_guide.md |
| Indian Legal Knowledge | docs/guides/indian-legal-knowledge.md |
| Qdrant Troubleshooting | docs/guides/qdrant_troubleshooting.md |
| Kaggle Setup | docs/guides/kaggle_setup.md |
| Dev Companion Agents (7) | docs/agents/*.md |
| Multi-Domain Audit | docs/reports/multi_domain_expansion_audit_2026-02-15.md |
| Name | Roll No. |
|---|---|
| Rakshit Sudheer Nair | ASI22CS149 |
| Sachin Eldho | ASI22CS159 |
| Sreeraj Rajeev | ASI22CS181 |
Guide: Dr. Ramani Bai V Institution: Adi Shankara Institute of Engineering and Technology, Kalady
Academic Project — All Rights Reserved
In legal AI, a wrong answer is worse than no answer.