RAG System with Anthropic API (Claude), FastAPI Web Interface, Paragraph-based Chunking, and PDF Upload
- ✅ Paragraph-based chunking
- ✅ PDF upload support
- ✅ Prompt caching (90% cost savings!)
- ✅ Smart relevance filtering
- ✅ Simple Web UI
- ✅ Qdrant as vector database
- ✅ Type-safe FastAPI routes
- ✅ Marked.js for Markdown rendering
python -m venv rag-app
source rag-app/bin/activate
pip install fastapi uvicorn python-multipart anthropic sentence-transformers qdrant-client pypdf
pip install fastapi uvicorn python-multipart anthropic transformers torch qdrant-client pypdf
pip install fastapi uvicorn python-multipart anthropic transformers torch qdrant-client pypdf tqdm
export ANTHROPIC_API_KEY='your-api-key'
uvicorn app_qdrant_fastapi:app --reload
uvicorn app_qdrant_fastapi_tf:app --reload
uvicorn app_qdrant_fastapi_tf_prog:app --reload
depending on the version you want to start.
Note
When you enable the progress bar, you will notice a warning like this on shutdown:
/home/stahlhe2/.local/share/pypoetry/python/cpython@3.12.9/lib/python3.12/multiprocessing/>resource_tracker.py:255: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects >to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
The semaphore leak warning is a known issue with sentence-transformers and transformers libraries when using multiprocessing, you can safely ignore it. The resources are still freed by the OS when the process exits.
Note
Instead of working with a local Qdrant database, you can also use the in-memory Qdrant instance for development: Change
self.qdrant_client = QdrantClient(path="./qdrant_db")
to
self.qdrant_client = QdrantClient(":memory:")
or use a Qdrant Docker container for persistence (https://qdrant.tech/documentation/quick_start/)
docker run -p 6333:6333 -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant
self.qdrant_client = QdrantClient(url="http://localhost:6333")
Open http://localhost:8000 in your browser
You will find the Swagger/OpenAPI docs under
You can optimize the startup time by skipping the example documents. Just comment out this line:
rag.add_documents(example_docs)