RAG PDF Assistant is a Streamlit-powered AI app that allows you to upload PDFs and ask natural language questions about their contents using Retrieval-Augmented Generation (RAG). It’s perfect for research papers, reports, contracts, or any document where you need quick insights.
- Upload any PDF and extract text automatically.
- Ask questions in natural language about your document.
- Retrieve accurate answers using RAG (text chunking + vector search + LLM generation).
- Interactive Streamlit interface for seamless user experience.
- Frontend/UI: Streamlit
- Backend: Python
- PDF Extraction: PyPDF2 or pdfplumber
- Embeddings & RAG: LangChain, SentenceTransformers
- Vector Database: FAISS / Chroma / Pinecone
- LLM: OpenAI API (or any compatible LLM)
- PDF Upload: Users upload a PDF file via the Streamlit interface.
- Text Extraction & Chunking: The PDF content is extracted and split into smaller chunks.
- Vectorization: Each chunk is converted into embeddings for semantic search.
- RAG Pipeline: When a user asks a question, the most relevant chunks are retrieved from the vector store.
- Answer Generation: The LLM generates a response using the retrieved chunks as context.
- Clone this repository:
git clone https://github.com/your-username/rag-pdf-assistant.git
cd rag-pdf-assistant