DocuMindRAG is a multi-tenant (org-scoped) PDF RAG backend with a built-in HITL (Human-in-the-Loop) correction layer.
It lets users upload PDFs, ask questions against their organization’s document corpus, and (for privileged roles) submit “corrections” as patches that override incorrect chunks so the system self-heals over time.
This repository contains a FastAPI backend + Postgres (pgvector) persistence + LangChain/LangGraph RAG pipeline.
- Upload PDFs to an organization workspace.
- PDFs are split into text chunks and stored in Postgres alongside vector embeddings (pgvector).
- Users ask questions via a chat endpoint.
- The RAG pipeline:
- contextualizes the user question using chat history
- retrieves relevant context from the org’s vector store
- generates an answer with an LLM
- Responses include source objects (chunks and/or patches) with IDs for traceability.
Privileged users (Admin/Senior) can create patches which act as corrected versions of content:
- Chunk-specific patch: correct a specific PDF chunk
- Patch-of-a-patch: create a new patch that supersedes a previous patch (old patch is deactivated)
- Org-global patch: add general corrections not attached to a single chunk (still retrieved by similarity)
During retrieval:
- Active patches are retrieved by similarity for the org.
- Document chunks that already have an active patch are excluded from chunk retrieval.
- This is the core “self-healing” mechanism: the system prefers corrected content and avoids returning the known-bad chunk.
There is no real auth (JWT/OAuth/password login) implemented yet. Instead, the API identifies the “current user” via:
X-Test-Email: <email>
On startup, the app seeds a default organization and 3 test users (if they don’t already exist):
admin@documind.com(roleadmin)senior@documind.com(rolesenior)junior@documind.com(roleviewer)
The data model supports organizations (organizations table), but this repo does not currently expose endpoints for users to create orgs or invite members. The org is created in startup seeding and users are assigned via the DB.
Roles are defined as:
adminseniorviewer(used for “junior” in the seeded test accounts)
Enforced permissions:
- Any role (viewer/senior/admin):
- upload PDFs
- list org documents
- chat/query (RAG)
- view own chat sessions and messages
- Admin only:
- delete documents
- Admin or Senior:
- create patches (HITL corrections)
- list patches
- activate/deactivate patches (rollback)
All document/chunk/patch queries are org-scoped by org_id.
- FastAPI: HTTP API layer (
app/main.py,app/api/routes/*) - SQLAlchemy: ORM and sessions (
app/db/*) - Postgres + pgvector: storage + ANN-style similarity ordering
- LangChain + LangGraph:
- question contextualization
- retrieval
- answer generation
- Upload
POST /api/v1/upload- PDF is loaded (PyPDFLoader) → split into chunks → each chunk embedded → stored as
document_chunks
- Ask a question (chat)
POST /api/v1/chat- LangGraph flow:
- contextualize question (uses last 10 messages from the chat session)
- retrieve top-k patches + top-k chunks (org-scoped)
- generate answer with LLM (
gpt-4oas currently coded)
- Correct the knowledge (HITL)
POST /api/v1/patches(admin/senior)- Store corrected text as a
chunk_patchesrow with embedding - Retrieval will now:
- return patch content (preferred)
- exclude the patched chunk from chunk retrieval results
At retrieval time:
- Patch retrieval:
- selects
chunk_patcheswhereorg_id == <org>andis_active == true - orders by cosine distance to query embedding
- selects
- Chunk retrieval:
- selects
document_chunkswhereorg_id == <org> - excludes chunks that have any active patch (
existssubquery) - orders by cosine distance to query embedding
- selects
The final retrieved context is a concatenation of:
- top-k patches (if any)
- top-k unpatched chunks
main.py: FastAPI app, router registration, DB table creation, and seed users/orgapi/deps.py: header-based auth + RBAC guardsschemas.py: Pydantic request/response models (chat + patches)routes/docs.py: PDF upload endpointdocuments.py: list/delete documentschat.py: chat sessions + messages + RAG chat endpointadmin.py: patch/HITL endpoints
core/config.py: settings + env var loadingsecurity.py: currently empty placeholder
db/models.py: SQLAlchemy models (orgs/users/docs/chunks/patches/chats/messages)session.py: engine + session factory
rag/ingestion.py: PDF loading + chunking + embedding + DB persistenceretrieval.py: patch + chunk retrieval (pgvector cosine ordering)chain.py: LangGraph state machine used by the chat endpoint
Main tables:
organizationsusers(belongs to an org)documents(belongs to an org)document_chunks(belongs to an org; storesVector(1536)embeddings)chunk_patches(belongs to an org; storesVector(1536)embeddings; supports rollback viais_active)chat_sessions(belongs to a user)messages(belongs to a chat session)
Notes:
- Embedding columns are
Vector(1536). Ensure your embedding model outputs 1536-d vectors. - Similarity ordering uses pgvector cosine distance.
Settings are defined in app/core/config.py and can be provided via environment variables or a .env file.
OPENAI_API_KEY
POSTGRES_USERPOSTGRES_PASSWORDPOSTGRES_SERVERPOSTGRES_PORTPOSTGRES_DB
OPENAI_API_KEY=your_openai_key_here
POSTGRES_USER=user
POSTGRES_PASSWORD=password
POSTGRES_SERVER=localhost
POSTGRES_PORT=5432
POSTGRES_DB=documind- Python 3.11+
- Postgres with pgvector enabled
- An OpenAI API key
python -m venv venv
# Windows PowerShell:
.\venv\Scripts\Activate.ps1pip install -r requirements.txtYou need a Postgres instance with the pgvector extension available.
One simple approach is a pgvector-enabled Postgres container:
docker run --name documind-postgres -d \
-e POSTGRES_USER=user \
-e POSTGRES_PASSWORD=password \
-e POSTGRES_DB=documind \
-p 5432:5432 \
ankane/pgvectorWindows PowerShell variant:
docker run --name documind-postgres -d `
-e POSTGRES_USER=user `
-e POSTGRES_PASSWORD=password `
-e POSTGRES_DB=documind `
-p 5432:5432 `
ankane/pgvectorThen, enable the extension (run once):
CREATE EXTENSION IF NOT EXISTS vector;uvicorn app.main:app --reload --host 0.0.0.0 --port 8000Open:
- API root:
http://localhost:8000/ - Health:
http://localhost:8000/health
This repo includes a Dockerfile for the API, but does not include a docker-compose.yml. You must provide Postgres separately.
docker build -t documind-rag .docker run --rm -p 8000:8000 \
-e OPENAI_API_KEY=your_openai_key_here \
-e POSTGRES_USER=user \
-e POSTGRES_PASSWORD=password \
-e POSTGRES_SERVER=host.docker.internal \
-e POSTGRES_PORT=5432 \
-e POSTGRES_DB=documind \
documind-ragWindows PowerShell variant:
docker run --rm -p 8000:8000 `
-e OPENAI_API_KEY=your_openai_key_here `
-e POSTGRES_USER=user `
-e POSTGRES_PASSWORD=password `
-e POSTGRES_SERVER=host.docker.internal `
-e POSTGRES_PORT=5432 `
-e POSTGRES_DB=documind `
documind-ragIf Postgres is running in a container, set POSTGRES_SERVER to that container’s network name on a shared Docker network.
All routes are mounted under /api/v1.
Add:
X-Test-Email: admin@documind.com(or the seeded senior/viewer emails)
Windows note:
- In PowerShell,
curlmay be an alias forInvoke-WebRequest. Usecurl.exeexplicitly, or useInvoke-RestMethod/Invoke-WebRequest.
POST /api/v1/upload- Form field:
file(must end with.pdf)
Example:
curl -X POST "http://localhost:8000/api/v1/upload" \
-H "X-Test-Email: admin@documind.com" \
-F "file=@./some.pdf"GET /api/v1/documents
curl "http://localhost:8000/api/v1/documents" \
-H "X-Test-Email: admin@documind.com"DELETE /api/v1/documents/{document_id}
POST /api/v1/chat
Body:
message: stringchat_id: optional int (continue an existing chat)
curl -X POST "http://localhost:8000/api/v1/chat" \
-H "Content-Type: application/json" \
-H "X-Test-Email: junior@documind.com" \
-d "{\"message\":\"What does this PDF say about pricing?\"}"Response includes:
response: the answersources: list of context items (patches and/or chunks) with IDssource_summary: short UX-oriented string like “From Patch #17”
GET /api/v1/chats
GET /api/v1/chats/{chat_id}/messages
POST /api/v1/patches
Rules (as implemented):
- Provide either
original_chunk_idorpatch_id(not both). - If
patch_idis provided, the referenced patch is deactivated and the new patch applies to the same original chunk. - If neither is provided, an org-global patch is created.
curl -X POST "http://localhost:8000/api/v1/patches" \
-H "Content-Type: application/json" \
-H "X-Test-Email: senior@documind.com" \
-d "{\"content\":\"Corrected value is 12.5%, not 15%.\", \"original_chunk_id\": 42}"GET /api/v1/patches?chunk_id=<optional>&active_only=<true|false>
PATCH /api/v1/patches/{patch_id}/deactivate
PATCH /api/v1/patches/{patch_id}/activate
- PDFs are ingested into
document_chunkswith embeddings. - Humans submit corrections as
chunk_patcheswith embeddings. - Retrieval combines:
- similarity search over active patches (preferred, corrected truth)
- similarity search over chunks excluding “patched” chunks (avoid known-bad)
- The chat endpoint returns sources including patch IDs and chunk IDs, enabling UI patterns like:
- “show me the evidence”
- “correct this chunk” → create patch for that chunk
- “rollback correction” → deactivate patch
This repo is an MVP backend. For production:
- Replace
X-Test-Emailheader auth with real authentication (JWT/OAuth2) and persistent user management. - Add org creation/invite flows and enforce org membership on signup.
- Add migrations (Alembic is listed but not currently used).
- Add background ingestion (queue) for large PDFs.
- Add rate limiting, request logging, and secrets management.
- Add tests around RBAC and retrieval “patch overrides chunk” invariants.