A powerful RAG (Retrieval-Augmented Generation) system that allows you to upload documents, ingest content from URLs, and ask questions about your knowledge base with AI-powered answers.
- PDF Files: Extract and process PDF documents
- Text Files: Plain text document processing
- Markdown Files: Structured markdown with proper parsing
- URL Ingestion: Fetch and process content from web URLs
- Smart Search: Vector-based semantic search across your documents
- AI-Powered Q&A: Get intelligent answers based on your content
- Conversational Memory: Maintains context across multiple questions
- File Upload: Drag-and-drop interface for document ingestion
- URL Ingestion: Process web content with progress indicators
- Delete Operations: Remove files, URLs, and their embeddings
- Bulk Clear: Reset entire knowledge base with one click
- Python 3.10+
- pip package manager
-
Clone the repository
git clone https://github.com/GovindKurapati/dev_docs_chat.git cd dev_docs_chat -
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables Create a
.envfile in the project root:GROQ_API_KEY=your_groq_api_key_here GROQ_API_BASE=https://api.groq.com/openai/v1
-
Get API Key
- Sign up at Groq
- Generate an API key
- Add it to your
.envfile
python app.pyThe application will be available at http://127.0.0.1:7860
Upload and process PDF, TXT, and Markdown files with drag-and-drop interface.
Fetch and process content from web URLs with progress indicators.
Manage your uploaded files and ingested URLs with easy deletion options.
Ask questions about your documents and get AI-powered answers with markdown formatting.
dev-docs-chat/
├── app.py # Main Gradio application
├── qa_pipeline.py # Question-answering logic
├── ingestion.py # Document ingestion logic
├── requirements.txt # Python dependencies
├── .env # Environment variables (create this)
├── chroma_db/ # Vector database storage
├── uploads/ # Uploaded file storage
├── ingested_urls.txt # List of ingested URLs
└── README.md # This file
- Vector Database: ChromaDB for efficient similarity search
- Embeddings: HuggingFace sentence-transformers
- LLM: Groq's fast LLM for quick responses
- Framework: Gradio for web interface
- Upload project documentation and README files
- Ask questions about implementation details
- Get instant answers about your codebase
- Ingest research papers and technical articles
- Ask questions about new technologies
- Stay updated with industry trends
- Upload tutorials and educational content
- Ask questions about complex topics
- Get personalized explanations
- Streaming Responses: Real-time answer generation
- File Type Support: Excel, Word, PowerPoint documents
- Advanced Search: Filters and date-based search
- Export Features: Save conversations and answers
- User Authentication: Multi-user support
- API Endpoints: REST API for integration
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain: For the RAG framework
- ChromaDB: For vector storage
- Gradio: For the web interface
- Groq: For fast LLM inference
- HuggingFace: For embedding models
Made with ❤️ by Govind Kurapati



