Ferdo is a full-stack AI assistant designed to answer student questions using official course materials and student forum posts. It uses Retrieval-Augmented Generation (RAG) to provide grounded, accurate answers, while never making things up. Ferdo can run entirely locally or using cloud LLMs such as Gemini or ChatGPT, however using cloud LLMs is recommended for better performance and accuracy.
- π Finds answers from course PDFs, HTML pages, and forum discussions.
- π§ Understands course subjects automatically β or asks you to choose.
- π Handles follow-up questions with full conversation context.
- π» Runs locally or with cloud LLMs for improved results.
Ferdo is structured as a monorepo containing three main components (engine, server, client) plus shared resources and documentation.
ferdo
ββ engine/ # Retrieval, LLM, pipelines, data ingestion
ββ server/ # FastAPI WebSocket/HTTP server
ββ client/ # React + Vite front-end
ββ shared/ # Common models and utilities
ββ docs/ # Masterβs thesis and related documentation
---
config:
theme: default
---
flowchart LR
subgraph UI["Client - React/Vite"]
A["Chat UI"]
B["WebSocket client"]
end
subgraph S["Server - FastAPI"]
C[/"WS: /ws"/]
D[/"REST: /api/subjects"/]
end
subgraph E["Engine - Python"]
E1["Subject Classifier"]
E2["Query Refiner"]
E3["Context Retriever"]
E4["Answer Generator"]
E5["Answer Refiner / Fitness Check"]
end
subgraph Data["Data Layer"]
V[("Chroma DB")]
F[("Course PDFs/HTML/TXT")]
end
A -- question --> B
B -- answer --> A
B -- JSON --> C
C -- JSON --> B
A -- getSubjects --> D
D -- subjects --> A
C -- EngineRequest --> E
E -- Answer/Errors --> C
E3 -- similarity search (question) --> V
V -- context --> E3
E4 -- generate(question, context) --> LLM[("LLM")]
LLM -- answer --> E4
V -. populated by .-> F
style E fill:#FFCDD2
style UI fill:#C8E6C9
style S fill:#BBDEFB
style Data fill:#FFF9C4
| Component | Technologies |
|---|---|
| Client | React 19, TypeScript, Vite, MUI, Emotion, Framer Motion, Lucide Icons, react-router-dom, react-use-websocket, ESLint 9, Prettier 3 |
| Server | FastAPI, Starlette WebSockets, Uvicorn, Pydantic, CORS middleware |
| Engine | LangChain + ChromaDB, HuggingFace paraphrase-multilingual-MiniLM-L12-v2 embeddings, pymupdf4llm, BeautifulSoup, TXT loaders, query refinement & answer pipelines, LLMs via Ollama/Gemini/ChatGPT |
π§ Engine (engine/)
Purpose: Given a question (and optional subject/history), produce a grounded answer using vector retrieval + LLM generation.
Pipeline:
- Subject detection β
pipeline/subject_classifier.py - Query refinement β
pipeline/query_refiner.py - Context retrieval β
pipeline/context_retriever.py - Answer generation β
pipeline/answer_generator.py - Fitness check + refinement β
pipeline/answer_fitness_check.py/pipeline/answer_refiner.py - Fallback mini-explanation if refinement fails β
pipeline/fallback_answer_generator.py
Data ingestion:
- PDFs/HTML/TXT β chunker β cleaner β persisted to Chroma.
- Runs via:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m engine.data_ingestion.db_initLLM Providers:
- Local: Ollama (gemma3:12B, gpt-oss:20B, deepseek-r1:14B)
- Cloud: Gemini 2.5 Flash, GPT-5-mini, Claude Haiku 3.5
π Server (server/)
Responsibilities:
- WebSocket endpoint
/wsfor real-time chat. - REST endpoint
/api/subjectsfor subject list. - CORS configuration for local dev.
- Per-socket conversation state.
Message Protocol:
| Direction | Type | Purpose |
|---|---|---|
| Client β Server | QUESTION |
Send userβs question (with optional selected subject) |
| Server β Client | ANSWER |
Answer payload |
ERROR |
Typed error (NO_CONTEXT, UNKNOWN_SUBJECT, etc.) |
π» Client (client/)
UI & State:
- Chat interface with subject detection & selection dialog.
- Typing animations (Framer Motion).
- Theming via
AppThemeProvider(light/dark). - Manual subject override.
Special Features:
- Loading placeholders.
- Friendly error handling.
- Smooth transitions between chat states.
GET /api/subjectsβ Returns:
[
{
"name": "Oblikovni obrasci u programiranju",
"abbreviation": "OOUP",
"aliases": [
"oblikovni",
"obrasci",
"design patterns"
]
},
{
"name": "Digitalna logika",
"abbreviation": "DIGLOG",
"aliases": [
"digitalna",
"logika"
]
},
{
"name": "Uvod u umjetnu inteligenciju",
"abbreviation": "UUUI",
"aliases": [
"ai",
"umjetna",
"inteligencija",
"umjetna inteligencija"
]
}
]Client β Server
{
"type": "question",
"question": "Kakvi su labosi iz OOUP?"
}{
"type": "question",
"question": "Kakvi su labosi iz OOUP?",
"subject": {
"name": "Oblikovni obrasci u programiranju",
"abbreviation": "OOUP",
"aliases": ["oblikovni", "obrasci", "design patterns"]
}
}Server β Client
{
"type": "answer",
"answer": "Laboratorijske vjeΕΎbe ukljuΔuju...",
"subject": {
"name": "Oblikovni obrasci u programiranju",
"abbreviation": "OOUP",
"aliases": ["oblikovni", "obrasci", "design patterns"]
}
}- Python 3.11
- Node.js 20+
- Optional: Ollama installed for local LLM use.
β οΈ Python Version Requirement β 3.11 only
Due to dependency wheel availability and packaging quirks, Ferdo currently works only with Python 3.11.
Please use Python 3.11 for everything: creating the virtualenv, runningpip, and installing requirements.
Use 3.11 everywhere
# Verify
python3.11 --version
# Create venv with 3.11
python3.11 -m venv venv
source venv/bin/activate
# Always bind pip to the venv's Python
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
# (If multiple pythons are installed) avoid calling system binaries directly
# BAD: pip install -r requirements.txt # may be tied to wrong Python
# BAD: python3.12 -m pip ... # wrong interpreter
# GOOD: python -m pip ... # uses the venv's Python 3.11Troubleshooting
- If you see
ERROR: No matching distribution foundfor packages likeaiohttp, doubleβcheck that your active interpreter is 3.11:python -c "import sys; print(sys.executable, sys.version)" - If your venv was created with a different Python version, recreate it with 3.11:
rm -rf venv python3.11 -m venv venv source venv/bin/activate python -m pip install -r requirements.txt
Configuration is split between the project root and the client directory. Use .env files in the respective
directories.
.envinside project root directory
LOCAL_LLM_MODEL=gemma3:12b
GOOGLE_GEMINI_API_KEY=...
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
LOGGING_ENABLED=True
SERVER_HOST=127.0.0.1
SERVER_PORT=8000
.envinside client directory
VITE_API_URL=http://127.0.0.1:8000
VITE_WS_URL=ws://127.0.0.1:8000
- Vector store persists at:
engine/data_ingestion/chroma_db/(Delete to rebuild)
β οΈ ImportantRun the following commands only after setting up your virtual environment and installing all dependencies. Skipping these steps or running them too early may cause errors or incomplete setup.
Before running the system for the first time, you must initialize the vector database with the required course materials and documents.
python -m engine.data_ingestion.db_initβ³ Headsβup: This step can take a while.
The ingestion pipeline needs to load, chunk, clean, embed/vectorize, and store all documents into ChromaDB.
Duration depends on dataset size, CPU/GPU, and the embedding model. Seeing logs like these is normal:[π] Processing folder <subject>... [π] Loading PDF... [π§±] Chunking... [π§Ή] Cleaning... [πΎ] Saving... [β ] Finished processing <subject>You can safely rerun the command to ingest new or changed files; existing items will be updated/merged as needed.
python -m server.apicd client
npm install
npm run devVisit https://127.0.0.1:5173
To run the application with local models on your device:
- You must have Ollama installed.
- You must have downloaded the models you want to use.
- Set the
LOCAL_LLM_MODELvariable in your.envfile to the desired model name, for example:LOCAL_LLM_MODEL=gemma3:12b
- If no
LOCAL_LLM_MODELflag is set, the application will try to use a cloud LLM provider if an API key is found (GOOGLE_GEMINI_API_KEY,OPENAI_API_KEY,ANTHROPIC_API_KEY). If no API key is found it will stop execution. - Hardware note: Make sure the models you choose fit within your device's available memory, otherwise performance issues or crashes may occur.
If you do not have Ollama installed or the models downloaded, you can still use cloud LLM providers such as Gemini or ChatGPT (recommended for best performance).
The entire application can be run using Docker Compose, which sets up all required services in isolated containers.
- Install Docker and Docker Compose.
- Create and configure your
.envfile in the project root with the necessary settings (see.envexample). - From the project root, run:
docker compose up --build
- This will:
- Start the backend
- Start the frontend
- (Optionally) connect to Ollama for local LLM usage if
LOCAL_LLM_MODELis set
- Access the web application via the URL shown in the terminal output (typically
http://localhost:5173).
To stop the containers:
docker compose down