An AI-powered study companion that uses Retrieval-Augmented Generation (RAG) to help students learn from their documents. Upload your study materials and get intelligent answers with source citations and personalized study tips.
- Document Processing: Upload and process PDF, TXT, and Markdown files
- AI-Powered Q&A: Ask questions about your study materials and get contextual answers
- Source Citations: Get references to specific documents and content snippets
- π‘Study Tips: Receive study recommendations based on your questions
- Fast API Backend: RESTful API built with FastAPI
- Web Frontend: Simple HTML frontend
- Vector Search: Document retrieval using ChromaDB
studybuddy-rag-assistant/
βββ src/studybuddy/ # Main package
β βββ main.py # FastAPI application
β βββ config.py # Configuration settings
β βββ models/ # Pydantic models
β βββ core/ # RAG engine logic
β βββ api/ # API routes and dependencies
β βββ utils/ # Utility functions
βββ documents/ # Upload your study materials here
βββ vector_db/ # ChromaDB vector storage
βββ frontend.html # Web interface
βββ pyproject.toml # Python package configuration
- Python 3.9+
- Poetry (for dependency management)
- OpenAI API key
-
Clone the repository
git clone <your-repo-url> cd studybuddy-rag-assistant
-
Install dependencies
poetry install
-
Set up environment variables
# Create .env file echo "OPENAI_API_KEY=your_openai_api_key_here" > .env
-
Add your study materials
# Place your PDF, TXT, or MD files in the documents/ folder cp your_study_materials.pdf documents/ -
Run the application
# Start the development server poetry run dev # Or run directly poetry run uvicorn studybuddy.main:app --reload --host 0.0.0.0 --port 8000
-
Access the application
- API Documentation: http://localhost:8000/docs
- Web Interface: Open
frontend.htmlin your browser - Health Check: http://localhost:8000/api/v1/health
- Open
frontend.htmlin your web browser - Type your question in the chat interface
- Get AI-powered answers with source citations and study tips
curl -X POST "http://localhost:8000/api/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"question": "What are the main concepts in machine learning?",
"include_sources": true,
"max_sources": 3
}'curl -X POST "http://localhost:8000/api/v1/upload" \
-F "file=@your_document.pdf"curl "http://localhost:8000/api/v1/health"Configure the application by setting environment variables or modifying src/studybuddy/config.py:
| Variable | Default | Description |
|---|---|---|
STUDYBUDDY_OPENAI_API_KEY |
- | Your OpenAI API key (required) |
STUDYBUDDY_OPENAI_MODEL |
gpt-4o-mini |
OpenAI model to use |
STUDYBUDDY_CHUNK_SIZE |
1000 |
Document chunk size for processing |
STUDYBUDDY_MAX_SOURCES |
3 |
Maximum source documents to return |
STUDYBUDDY_DEBUG |
false |
Enable debug mode |
- PDF (
.pdf) - Research papers, textbooks, lecture notes - Text (
.txt) - Plain text documents - Markdown (
.md) - Formatted notes and documentation
- Document Processing: Your documents are split into chunks and converted into vector embeddings
- Vector Storage: Embeddings are stored in ChromaDB for efficient similarity search
- Question Processing: When you ask a question, the system finds the most relevant document chunks
- Answer Generation: OpenAI's GPT model generates contextual answers based on retrieved content
- Study Tips: Additional AI-generated study recommendations are provided
src/studybuddy/
βββ __init__.py
βββ main.py # FastAPI app with lifespan management
βββ config.py # Pydantic settings with environment variables
βββ models/
β βββ requests.py # ChatRequest, DocumentUploadRequest
β βββ responses.py # ChatResponse, SourceDocument, etc.
βββ core/
β βββ rag_engine.py # StudyBuddyRAG class with core logic
βββ api/
β βββ dependencies.py # FastAPI dependency injection
β βββ routes/
β βββ health.py # Health check endpoints
β βββ chat.py # Chat endpoints
β βββ documents.py # Document upload endpoints
βββ utils/
βββ document_processor.py- StudyBuddyRAG: Core RAG engine handling document processing and question answering
- FastAPI App: REST API with automatic OpenAPI documentation
- Pydantic Models: Type-safe request/response models
- ChromaDB: Vector database for document embeddings
- LangChain: Framework for building the RAG pipeline
# Run tests (when implemented)
poetry run pytest
# Code formatting
poetry run black src/
poetry run isort src/
# Linting
poetry run flake8 src/-
OpenAI API Key Error
ValueError: OPENAI_API_KEY environment variable is requiredSolution: Set your OpenAI API key in the
.envfile or environment variables (in this repo, I choose to set global environment, hence there is no need for.env. -
Document Processing Fails
Error processing document.pdf: [Errno 2] No such file or directorySolution: Ensure the document is in the
documents/directory and has a supported file extension. -
Vector Database Issues
ChromaDB connection errorSolution: Clear the
vector_db/directory and restart the application.
- Chunk Size: Adjust
chunk_sizein config for your document types (larger for academic papers, smaller for notes) - Model Selection: Use
gpt-4o-minifor cost efficiency orgpt-4for better quality - Document Organization: Group related documents by subject for better retrieval
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with FastAPI for the web framework
- LangChain for RAG implementation
- ChromaDB for vector storage
- OpenAI for language model capabilities
Happy Studying! πβ¨