Skip to content

AI-powered Indian legal assistant (BNS/BNSS/BSA + 19 civil acts + SC judgments)

Notifications You must be signed in to change notification settings

sachineldho24/NeethiAI

Repository files navigation

Neethi AI

Where Justice Meets Awareness

An AI-powered legal assistant for Indian citizens, practising lawyers, law firms, and corporate legal advisors. Neethi provides accurate, grounded guidance across Indian criminal law (BNS, BNSS, BSA), nineteen civil and special acts, constitutional provisions, and ~450,000 Supreme Court judgments — with deterministic safeguards against the section-number errors introduced by India's July 2024 criminal law transition.

Python 3.9+ FastAPI Tests License


Current Status

Milestone Status
Data pipeline (1,059 sections + 1,220 mappings) COMPLETE
Core tools (StatuteMapper, Guardrails, RepealHandler) COMPLETE
RAG pipeline — criminal statutes (3,143 vectors) COMPLETE
RAG pipeline — 19 civil acts (4,552 vectors) COMPLETE
RAG pipeline — SC judgments (~452K vectors) COMPLETE
RAG pipeline — bail judgments (3,600 vectors) COMPLETE
Hybrid search (dense + BM25 RRF) COMPLETE
Agentic LLM (two-turn tool-calling via GROQ) COMPLETE
Multi-domain routing (criminal / civil / constitutional / overlap) COMPLETE
FastAPI + JWT auth + rate limiting COMPLETE
Safety guard (PII masking + injection check) COMPLETE
Intent classifier + query clarification COMPLETE
React frontend COMPLETE
Test suite 147 / 147 passing
Document Drafter agent (FIR/RTI) TODO — Phase 4
Resource Locator agent (courts/police) TODO — Phase 4
Supabase integration (persistent user DB) TODO — Phase 4

How It Works — Full Pipeline

User Query (POST /api/chat)
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│  LAYER 1: API  (FastAPI + JWT + CORS + slowapi)             │
│  Rate limit: 30 req/min per client                          │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│  LAYER 2: PRE-PROCESSING  (Deterministic, minimal LLM)      │
│                                                             │
│  ① SafetyGuard                                              │
│     • PII masking (Aadhaar, phone, email)                   │
│     • Prompt injection detection                            │
│                                                             │
│  ② IntentClassifier                                         │
│     • Classify: legal_provision / punishment / case_search  │
│     •           document_draft / resource_locate / general  │
│     • Vague query detection → ASK_CLARIFICATION             │
│     • Short-circuit on clarification (zero LLM cost)        │
│                                                             │
│  ③ Guardrails.validate_query()                              │
│     • Domain routing: CRIMINAL / CIVIL / CONSTITUTIONAL /   │
│       OVERLAP / STATIC_REFERRAL                             │
│     • Entity extraction: section numbers, law codes         │
│     • On REJECT → return rejection message immediately      │
│                                                             │
│  ④ StatuteMapper.map_query()                                │
│     • Detects old-law references in query                   │
│     • check_false_friend() FIRST (e.g. IPC 302 ≠ BNS 302)  │
│     • Maps: IPC→BNS | CrPC→BNSS | IEA→BSA                  │
│     • Returns: [(original_text, law_code, MappingResult)]   │
│                                                             │
│  ⑤ RepealHandler (triggered if mapping = REPEALED)          │
└───────────────────────┬─────────────────────────────────────┘
                        │  ValidationResult + MappingResults
                        ▼
┌─────────────────────────────────────────────────────────────┐
│  LAYER 3: ORCHESTRATOR  (Routes by intent + domain)         │
│                                                             │
│  Criminal domain   → LegalAdvisorAgent (criminal prompt)    │
│  Civil domain      → LegalAdvisorAgent (civil advisor prompt)│
│  Constitutional    → LegalAdvisorAgent (constitutional)     │
│  Overlap domain    → LegalAdvisorAgent (combined context)   │
│  case_search       → PrecedentResearcherAgent               │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│  LAYER 4: AGENTIC TOOL-CALLING  (Two-turn LLM loop)         │
│                                                             │
│  Turn 1: GROQ (llama-3.3-70b-versatile) + function calling  │
│    • LLM decides which tools to call                        │
│    • Tools: search_statutes | search_civil_acts |           │
│             search_case_law | search_bail_judgments         │
│    • civil_law_hint: filters civil acts search to act       │
│                                                             │
│  Tool execution (Hybrid RRF Search):                        │
│    Dense (all-MiniLM-L6-v2 384d) + BM25 text merged via RRF │
│    search_statutes      → neethi_statutes (3,143 vectors)   │
│    search_civil_acts    → neethi_civil_acts (4,552 vectors) │
│    search_case_law      → indian_kanoon_sc_judgments (~452K)│
│    search_bail_judgments→ neethi_bail_judgments (3,600)     │
│                                                             │
│  Turn 2: GROQ synthesises answer from tool results          │
│    • Only cites sections present in retrieved chunks        │
│    • Appends standard legal disclaimer                      │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│  LAYER 5: VERIFICATION                                      │
│                                                             │
│  ChunkVerifier  (llama-3.1-8b-instant)                      │
│    • Grades each retrieved chunk for relevance              │
│  GroundingCheck (llama-3.1-8b-instant)                      │
│    • Validates answer is grounded in retrieved context      │
│  GatekeeperNode (llama-3.1-8b-instant)                      │
│    • Catches LLM answer that addresses wrong topic          │
└───────────────────────┬─────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────────────┐
│  LAYER 6: RESPONSE                                          │
│  {                                                          │
│    "response": "...",                                       │
│    "metadata": {                                            │
│      "domain": "CRIMINAL",                                  │
│      "intent": "punishment",                                │
│      "agent_used": "legal_advisor",                         │
│      "section_mappings": [...],                             │
│      "rag_hits": {"statutes": 7, "civil_acts": 3},          │
│      "warnings": ["False friend: IPC 302 → BNS 103"],       │
│      "source_citations": [...],                             │
│      "response_time_ms": 2340                               │
│    },                                                       │
│    "disclaimer": "..."                                      │
│  }                                                          │
└─────────────────────────────────────────────────────────────┘

Multi-Domain Coverage

Domain Supported Acts Route
Criminal BNS, BNSS, BSA (+ IPC/CrPC/IEA pre-2024) CRIMINAL_PIPELINE
Civil Consumer Protection, Contract Act, Transfer of Property, CPC, Specific Relief, Limitation, Arbitration CIVIL_PIPELINE
IT / Data IT Act 2000, DPDP Act 2023 CIVIL_PIPELINE
Family / Gender PWDVA 2005, HMA 1955, SC/ST PoA 1989, POSH 2013 CIVIL_PIPELINE
Financial Negotiable Instruments Act, IBC 2016 CIVIL_PIPELINE
Public Law RTI Act 2005 CIVIL_PIPELINE
Constitutional Constitution of India (Fundamental Rights, Writs) CONSTITUTIONAL
Overlap Cyber fraud (BNS + IT Act), Domestic violence (BNS + PWDVA), Workplace harassment (BNS + POSH) OVERLAP_PIPELINE

Out-of-scope (graceful redirect): Pure company incorporation, SEBI filings, GST/income tax compliance.


Tech Stack

Layer Technology Notes
API Framework FastAPI 0.115 + uvicorn Pydantic v2 models
Auth PyJWT + bcrypt In-memory store; swap Supabase in Phase 4
Rate Limiting slowapi 30 req/min on chat; per-user-id
LLM (primary) GROQ llama-3.3-70b-versatile Free tier, 100K TPD, function calling
LLM (grading) GROQ llama-3.1-8b-instant ChunkVerifier, GroundingCheck, Gatekeeper
LLM (fallback) Anthropic Claude When GROQ not configured
Embeddings all-MiniLM-L6-v2 384d, CPU-only, ~90 MB
Vector DB Qdrant Cloud (free tier) 1 GB RAM, 4 GB disk
Search Hybrid RRF (dense + BM25) HYBRID_SEARCH=true by default
PDF Parsing PyMuPDF (fitz) BPRD handbook + 19 civil act PDFs
Testing pytest 7.4 147/147 tests passing
Database Supabase PostgreSQL Schema designed; not yet wired
Frontend React.js + Vite + Tailwind Phase 5 complete

Data Architecture

Vector Collections (Qdrant)

Collection Vectors Content
neethi_statutes 3,143 BNS/BNSS/BSA + Constitution of India
neethi_civil_acts 4,552 19 civil & special acts
indian_kanoon_sc_judgments ~452,000 Supreme Court judgments 1950–2024
neethi_bail_judgments 3,600 1,200 annotated bail orders × 3 chunks

Civil Acts Ingested (neethi_civil_acts)

Code Act Vectors
ITA2000 Information Technology Act 2000 205
DPDP2023 Digital Personal Data Protection Act 2023 107
PWDVA2005 Protection of Women from Domestic Violence Act 59
HMA1955 Hindu Marriage Act 1955 72
HSA1956 Hindu Succession Act 1956 66
SMA1954 Special Marriage Act 1954 105
TPA1882 Transfer of Property Act 1882 254
ICA1872 Indian Contract Act 1872 276
CPA2019 Consumer Protection Act 2019 190
SCST_POA1989 SC/ST Prevention of Atrocities Act 73
POSH2013 Sexual Harassment at Workplace Act 60
NIA1881 Negotiable Instruments Act 1881 159
IBC2016 Insolvency and Bankruptcy Code 2016 606
RTI2005 Right to Information Act 2005 112
CPC1908 Code of Civil Procedure 1908 1,719
LIMA1963 Limitation Act 1963 103
SRA1963 Specific Relief Act 1963 76
ARBI1996 Arbitration and Conciliation Act 1996 286
DPA1961 Dowry Prohibition Act 1961 24

Statute JSONs (data/statutes/)

File Sections Size
bns_complete.json 358 416 KB
bnss_complete.json 531 577 KB
bsa_complete.json 170 178 KB

Mapping JSONs (data/mappings/)

File Mappings
ipc_to_bns.json 525
crpc_to_bnss.json 513
iea_to_bsa.json 182

Total: 1,220 bidirectional mappings.


Critical Safety: The Section Number Trap

The same number means completely different things in old vs. new law. Getting this wrong causes catastrophic legal errors.

Old Law Old Section Meaning WRONG New CORRECT New
CrPC 438 Anticipatory Bail BNSS 438 (Revision) BNSS 482
CrPC 439 Bail non-bailable BNSS 439 (Transfer) BNSS 483
IPC 302 Murder BNS 302 (Stolen property) BNS 103
IPC 420 Cheating BNS 420 (Personation) BNS 318(4)
IPC 379 Theft BNS 379 (Misappropriation) BNS 303

StatuteMapper detects these false friends before any LLM call and injects a warning into the prompt context. It is called on every query that contains a section reference — no exceptions.


API Reference

POST /api/chat

Process a legal query through the full pipeline.

Request:

{
  "message": "What is the punishment for murder under BNS?",
  "session_id": "optional-uuid",
  "language": "en"
}

Response:

{
  "response": "Under BNS Section 103, murder is punishable by...",
  "metadata": {
    "domain": "CRIMINAL",
    "intent": "punishment",
    "confidence": 0.95,
    "agent_used": "legal_advisor",
    "section_mappings": [],
    "warnings": [],
    "entities": { "sections": ["103"], "laws": ["BNS"] },
    "source_citations": ["BNS Section 103 - Murder"],
    "rag_hits": { "statutes": 7, "cases": 2 },
    "response_time_ms": 2310
  },
  "disclaimer": "This information is for general awareness only..."
}

Rate limit: 30 requests/minute per client IP.

GET /api/health

Returns {"status": "healthy"} with uptime info.

GET /api/ready

Pings all Qdrant collections. Returns {"ready": true} only when Qdrant is reachable.

POST /api/auth/register

Register a new user. Returns JWT token.

POST /api/auth/login

Authenticate and receive JWT token.

GET /api/auth/me

Return current user info (requires Authorization: Bearer <token>).


Project Structure

Neethi/
├── CLAUDE.md                    ← Development instructions + agent protocol
├── README.md
├── requirements.txt
├── .env.example
│
├── src/                         ← Application source
│   ├── agents/
│   │   └── orchestrator.py      ← Intent router + agentic loop + GROQ function calling
│   ├── api/
│   │   ├── app.py               ← FastAPI factory, CORS, lifespan
│   │   ├── middleware/
│   │   │   ├── auth.py          ← PyJWT + bcrypt, inject_user_state()
│   │   │   └── rate_limit.py    ← slowapi singleton limiter
│   │   └── routes/
│   │       ├── auth.py          ← /api/auth/register, login, me
│   │       ├── chat.py          ← POST /api/chat  (main endpoint)
│   │       └── health.py        ← GET /api/health, /api/ready
│   ├── config/
│   │   └── settings.py          ← 30+ env vars (dotenv + pydantic-settings)
│   ├── intent/
│   │   └── classifier.py        ← Intent classification + vague query detection
│   ├── rag/
│   │   ├── embedder.py          ← all-MiniLM-L6-v2 singleton, batch support
│   │   └── chunking_strategy.py ← Semantic legal text chunking
│   ├── safety/
│   │   └── safety_guard.py      ← PII masking + prompt injection detection
│   ├── services/
│   │   └── thesys_service.py    ← Thesys UI streaming service
│   ├── tools/
│   │   ├── statute_mapper.py    ← CRITICAL: false-friend detection + mapping
│   │   ├── guardrails.py        ← Multi-domain routing + intent classification
│   │   ├── rag_tool.py          ← Quad-collection hybrid Qdrant search
│   │   ├── repeal_handler.py    ← Repealed/omitted provision handling
│   │   └── find_legal_resources.py ← Resource locator tool
│   ├── utils/
│   │   └── logger.py
│   └── verification/
│       └── chunk_verifier.py    ← Stage 2 grading (llama-3.1-8b)
│
├── data/
│   ├── statutes/                ← 1,059 sections with full legal text
│   │   ├── bns_complete.json
│   │   ├── bnss_complete.json
│   │   └── bsa_complete.json
│   ├── mappings/                ← 1,220 bidirectional mappings
│   │   ├── ipc_to_bns.json
│   │   ├── crpc_to_bnss.json    ← Contains section_438_trap (DO NOT REMOVE)
│   │   └── iea_to_bsa.json
│   └── raw/
│       ├── bprd_bns_handbook.pdf
│       ├── bprd_bnss_handbook.pdf
│       ├── bprd_bsa_handbook.pdf
│       ├── constitution_of_india.json
│       ├── indian_bail_judgments.json
│       └── indiacode_pdfs/      ← 19 civil act PDFs (source data)
│
├── docs/
│   ├── architecture/
│   │   ├── system_architecture.md
│   │   ├── agentic_workflow.md
│   │   └── database_schema.md
│   ├── agents/                  ← 7 development companion agent prompts
│   ├── guides/                  ← Developer how-to guides
│   └── reports/                 ← Audit and phase reports
│
├── tests/
│   ├── conftest.py
│   ├── test_statute_mapper.py   ← 34 tests (section number trap guards)
│   ├── test_guardrails.py       ← 24 tests
│   ├── test_guardrails_routing.py ← 42 tests (multi-domain routing)
│   ├── test_intent_classifier.py  ← 22 tests
│   ├── test_safety_guard.py     ← 21 tests
│   ├── test_auth.py             ← 13 JWT auth tests
│   ├── test_rate_limit.py       ← 4 rate limit tests
│   └── test_resource_tool.py
│
├── scripts/
│   ├── ingest_to_qdrant_production.py   ← Ingest BNS/BNSS/BSA → Qdrant
│   ├── ingest_legal_acts.py             ← Ingest 19 civil acts → Qdrant
│   ├── ingest_bail_judgments.py         ← Ingest bail orders → Qdrant
│   ├── ingest_constitution.py           ← Ingest Constitution → Qdrant
│   ├── setup_fulltext_indexes.py        ← Create BM25 + keyword indexes (run once)
│   ├── KAGGLE_PRODUCTION_FINAL.py       ← Indian Kanoon judgment ingestion
│   ├── enhance_bns_legal_text.py
│   ├── enhance_bnss_legal_text.py
│   ├── enhance_bsa_legal_text.py
│   ├── parse_bprd_handbooks.py
│   ├── evaluate_indiclegalqa.py
│   └── validate_data_quality.py
│
├── neethi-frontend/             ← React frontend (Phase 5)
│   ├── src/
│   │   ├── components/          ← ChatWindow, ChatInput, MessageBubble, etc.
│   │   ├── pages/               ← ChatPage, UsageDashboard
│   │   ├── hooks/
│   │   ├── services/
│   │   └── store/
│   ├── index.html
│   ├── vite.config.js
│   └── package.json
│
└── Documents/                   ← Academic papers and design documents

Setup

Prerequisites

Backend Installation

# 1. Clone and create virtual environment
python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # Linux/macOS

# 2. Install dependencies
pip install -r requirements.txt

# 3. Configure environment
copy .env.example .env
# Edit .env — required variables:
#   GROQ_API_KEY=gsk_...
#   QDRANT_URL=https://xxx.qdrant.io
#   QDRANT_API_KEY=...
#   JWT_SECRET_KEY=<any-long-random-string>
# Optional:
#   ANTHROPIC_API_KEY=sk-ant-...  (LLM fallback)
#   HYBRID_SEARCH=true            (BM25 + dense, default: true)
#   SUPABASE_URL=...              (persistent DB — Phase 4)
#   SUPABASE_KEY=...

# 4. Run tests
pytest tests/ -v
# Expected: 147 passed

# 5. Start the server
python -m uvicorn src.api.app:app --port 8002 --reload
# API docs: http://localhost:8002/docs

Ingest Data into Qdrant (first-time setup)

# Criminal law statutes (BNS/BNSS/BSA + Constitution)
python scripts/ingest_to_qdrant_production.py

# 19 civil and special acts
python scripts/ingest_legal_acts.py

# Annotated bail judgments
python scripts/ingest_bail_judgments.py

# Create BM25 full-text + keyword indexes (required for hybrid search)
python scripts/setup_fulltext_indexes.py

# Indian Kanoon Supreme Court judgments (requires Kaggle credentials)
# See docs/guides/kaggle_setup.md
python scripts/KAGGLE_PRODUCTION_FINAL.py

Frontend Setup

cd neethi-frontend
npm install
npm run dev       # dev server at http://localhost:5173
npm run build     # production build

Running Tests

# Full suite
pytest tests/ -v

# Specific suites
pytest tests/test_statute_mapper.py -v      # 34 tests — section number trap
pytest tests/test_guardrails.py -v          # 24 tests
pytest tests/test_guardrails_routing.py -v  # 42 tests — multi-domain routing
pytest tests/test_intent_classifier.py -v   # 22 tests
pytest tests/test_safety_guard.py -v        # 21 tests
pytest tests/test_auth.py -v                # 13 JWT auth tests
pytest tests/test_rate_limit.py -v          # 4 rate limit tests

# With coverage report
pytest tests/ --cov=src --cov-report=term-missing

Key Concepts

Temporal Accuracy

Period Applicable Law
Before July 1, 2024 IPC + CrPC + Indian Evidence Act
After July 1, 2024 BNS + BNSS + BSA
Pending cases Mixed (depends on date of offence)

Hybrid Search (RRF)

Dense vector similarity (all-MiniLM-L6-v2) and BM25 keyword matching are merged via Reciprocal Rank Fusion. An RRF score of 1.0 means a chunk ranked #1 in both dense and text search — these are highly reliable retrievals. Gracefully falls back to dense-only if text indexes are absent.

Concept Keyword Pinning

all-MiniLM-L6-v2 can semantically drift on procedural queries (e.g., "maximum remand period" matches limitation period, not remand). The _CONCEPT_HINTS dictionary in orchestrator.py pins natural-language concepts to canonical sections, bypassing the threshold and prepending them to retrieval results.


What's Done

Component Status Detail
StatuteMapper Production 5 false-friend pairs, 1,220 mappings
Guardrails Production Multi-domain routing + intent + entity extraction
RepealHandler Production REPEALED / OMITTED / DELETED status handling
SafetyGuard Production PII masking, prompt injection detection
IntentClassifier Production 7 intents + ASK_CLARIFICATION, vague query detection
Embedder Production all-MiniLM-L6-v2, singleton, batch, CPU-only
Chunking Production Semantic legal boundaries, sub-section aware
RAG Tool Production Quad-collection hybrid search (RRF)
LegalAdvisorAgent Production Multi-domain prompts, concept pinning
PrecedentResearcher Production SC case law + bail judgment search
ChunkVerifier Production Stage 2 grading via llama-3.1-8b
GroundingCheck Production Answer grounding validation
GatekeeperNode Production Topic alignment guard
Orchestrator Production Intent routing, GROQ function calling, Anthropic fallback
FastAPI + routes Production /chat, /health, /ready, /auth/*
JWT Auth Production Register / login / me
Rate Limiting Production slowapi, 30 req/min on chat
Criminal law data Production 1,059 sections, full legal_text
Mapping JSONs Production 1,220 IPC/CrPC/IEA → BNS/BNSS/BSA
Civil acts data Production 19 acts, 4,552 vectors in Qdrant
SC Judgments Production ~452K judgment chunks in Qdrant
Bail Judgments Production 1,200 annotated orders, 3,600 vectors
React Frontend Production Chat UI, history sidebar, usage dashboard

Roadmap

Phase 4 — In Progress

Task Priority
Supabase integration (persistent users/sessions) High
Document Drafter agent (FIR, RTI, bail applications) High
Resource Locator agent (courts, police stations by location) Medium
Session persistence (multi-turn memory) Medium
Multilingual support (Hindi) Medium

Phase 6 — Scale & Monitor

Task Description
Cloudflare CDN + WAF Production hosting
Query analytics dashboard Section-level usage tracking
Gazette tracking Detect future BNS/BNSS/BSA amendments
CI/CD pipeline Automated test + deploy

Key Mappings Reference

Crimes — IPC → BNS

IPC BNS Offence
302 103 Murder
304 105 Culpable Homicide
376 63–69 Sexual Offences
420 318(4) Cheating
379 303 Theft
384 308 Extortion
392 309 Robbery
498A 85 Cruelty by Husband

Procedures — CrPC → BNSS

CrPC BNSS Procedure
41 35 Arrest without warrant
154 173 FIR registration
161 180 Examination of witnesses
164 183 Recording confessions
167 187 Remand
438 482 Anticipatory bail
439 483 Bail in non-bailable offences

Documentation Index

Document Path
System Architecture docs/architecture/system_architecture.md
Agentic Workflow docs/architecture/agentic_workflow.md
Database Schema docs/architecture/database_schema.md
RAG Ingestion Guide docs/guides/rag_ingestion_guide.md
Indian Legal Knowledge docs/guides/indian-legal-knowledge.md
Qdrant Troubleshooting docs/guides/qdrant_troubleshooting.md
Kaggle Setup docs/guides/kaggle_setup.md
Dev Companion Agents (7) docs/agents/*.md
Multi-Domain Audit docs/reports/multi_domain_expansion_audit_2026-02-15.md

Team

Name Roll No.
Rakshit Sudheer Nair ASI22CS149
Sachin Eldho ASI22CS159
Sreeraj Rajeev ASI22CS181

Guide: Dr. Ramani Bai V Institution: Adi Shankara Institute of Engineering and Technology, Kalady


License

Academic Project — All Rights Reserved


In legal AI, a wrong answer is worse than no answer.

About

AI-powered Indian legal assistant (BNS/BNSS/BSA + 19 civil acts + SC judgments)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published