Skip to content

Ferdo is a full-stack AI assistant designed to answer student questions using official course materials and student forum posts. It uses Retrieval-Augmented Generation (RAG) to provide grounded, accurate answers, while never making things up.

Notifications You must be signed in to change notification settings

andre0805/ferdo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ferdo-logo

πŸŽ“ Ferdo β€” Student Chatbot

Ferdo is a full-stack AI assistant designed to answer student questions using official course materials and student forum posts. It uses Retrieval-Augmented Generation (RAG) to provide grounded, accurate answers, while never making things up. Ferdo can run entirely locally or using cloud LLMs such as Gemini or ChatGPT, however using cloud LLMs is recommended for better performance and accuracy.


πŸ“ What Ferdo Does

  • πŸ“š Finds answers from course PDFs, HTML pages, and forum discussions.
  • 🧠 Understands course subjects automatically β€” or asks you to choose.
  • πŸ”„ Handles follow-up questions with full conversation context.
  • πŸ’» Runs locally or with cloud LLMs for improved results.

πŸ— Architecture Overview

Ferdo is structured as a monorepo containing three main components (engine, server, client) plus shared resources and documentation.

ferdo
β”œβ”€ engine/        # Retrieval, LLM, pipelines, data ingestion
β”œβ”€ server/        # FastAPI WebSocket/HTTP server
β”œβ”€ client/        # React + Vite front-end
β”œβ”€ shared/        # Common models and utilities
└─ docs/          # Master’s thesis and related documentation

πŸ”„ System Flow

---
config:
  theme: default
---
flowchart LR
 subgraph UI["Client - React/Vite"]
        A["Chat UI"]
        B["WebSocket client"]
  end
 subgraph S["Server - FastAPI"]
        C[/"WS: /ws"/]
        D[/"REST: /api/subjects"/]
  end
 subgraph E["Engine - Python"]
        E1["Subject Classifier"]
        E2["Query Refiner"]
        E3["Context Retriever"]
        E4["Answer Generator"]
        E5["Answer Refiner / Fitness Check"]
  end
 subgraph Data["Data Layer"]
        V[("Chroma DB")]
        F[("Course PDFs/HTML/TXT")]
  end
    A -- question --> B
    B -- answer --> A
    
    B -- JSON --> C
    C -- JSON --> B
    
    A -- getSubjects --> D
    D -- subjects --> A
    
    C -- EngineRequest --> E
    E -- Answer/Errors --> C
    
    E3 -- similarity search (question) --> V
    V -- context --> E3
    
    E4 -- generate(question, context) --> LLM[("LLM")]
    LLM -- answer --> E4
    
    V -. populated by .-> F
    
    style E fill:#FFCDD2
    style UI fill:#C8E6C9
    style S fill:#BBDEFB
    style Data fill:#FFF9C4
Loading

πŸ›  Technologies

Component Technologies
Client React 19, TypeScript, Vite, MUI, Emotion, Framer Motion, Lucide Icons, react-router-dom, react-use-websocket, ESLint 9, Prettier 3
Server FastAPI, Starlette WebSockets, Uvicorn, Pydantic, CORS middleware
Engine LangChain + ChromaDB, HuggingFace paraphrase-multilingual-MiniLM-L12-v2 embeddings, pymupdf4llm, BeautifulSoup, TXT loaders, query refinement & answer pipelines, LLMs via Ollama/Gemini/ChatGPT

πŸ“¦ Component Details

🧠 Engine (engine/)

Purpose: Given a question (and optional subject/history), produce a grounded answer using vector retrieval + LLM generation.

Pipeline:

  1. Subject detection β†’ pipeline/subject_classifier.py
  2. Query refinement β†’ pipeline/query_refiner.py
  3. Context retrieval β†’ pipeline/context_retriever.py
  4. Answer generation β†’ pipeline/answer_generator.py
  5. Fitness check + refinement β†’ pipeline/answer_fitness_check.py / pipeline/answer_refiner.py
  6. Fallback mini-explanation if refinement fails β†’ pipeline/fallback_answer_generator.py

Data ingestion:

  • PDFs/HTML/TXT β†’ chunker β†’ cleaner β†’ persisted to Chroma.
  • Runs via:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m engine.data_ingestion.db_init

LLM Providers:

  • Local: Ollama (gemma3:12B, gpt-oss:20B, deepseek-r1:14B)
  • Cloud: Gemini 2.5 Flash, GPT-5-mini, Claude Haiku 3.5
🌐 Server (server/)

Responsibilities:

  • WebSocket endpoint /ws for real-time chat.
  • REST endpoint /api/subjects for subject list.
  • CORS configuration for local dev.
  • Per-socket conversation state.

Message Protocol:

Direction Type Purpose
Client β†’ Server QUESTION Send user’s question (with optional selected subject)
Server β†’ Client ANSWER Answer payload
ERROR Typed error (NO_CONTEXT, UNKNOWN_SUBJECT, etc.)
πŸ’» Client (client/)

UI & State:

  • Chat interface with subject detection & selection dialog.
  • Typing animations (Framer Motion).
  • Theming via AppThemeProvider (light/dark).
  • Manual subject override.

Special Features:

  • Loading placeholders.
  • Friendly error handling.
  • Smooth transitions between chat states.

πŸ“‘ API Endpoints

REST

  • GET /api/subjects β†’ Returns:
[
  {
    "name": "Oblikovni obrasci u programiranju",
    "abbreviation": "OOUP",
    "aliases": [
      "oblikovni",
      "obrasci",
      "design patterns"
    ]
  },
  {
    "name": "Digitalna logika",
    "abbreviation": "DIGLOG",
    "aliases": [
      "digitalna",
      "logika"
    ]
  },
  {
    "name": "Uvod u umjetnu inteligenciju",
    "abbreviation": "UUUI",
    "aliases": [
      "ai",
      "umjetna",
      "inteligencija",
      "umjetna inteligencija"
    ]
  }
]

WebSocket /ws

Client β†’ Server

{
  "type": "question",
  "question": "Kakvi su labosi iz OOUP?"
}
{
  "type": "question",
  "question": "Kakvi su labosi iz OOUP?",
  "subject": {
    "name": "Oblikovni obrasci u programiranju",
    "abbreviation": "OOUP",
    "aliases": ["oblikovni", "obrasci", "design patterns"]
  }
}

Server β†’ Client

{
  "type": "answer",
  "answer": "Laboratorijske vježbe uključuju...",
  "subject": {
    "name": "Oblikovni obrasci u programiranju",
    "abbreviation": "OOUP",
    "aliases": ["oblikovni", "obrasci", "design patterns"]
  }
}

πŸ§‘β€πŸ’» Local Development

1️⃣ Prerequisites

  • Python 3.11
  • Node.js 20+
  • Optional: Ollama installed for local LLM use.

⚠️ Python Version Requirement β€” 3.11 only
Due to dependency wheel availability and packaging quirks, Ferdo currently works only with Python 3.11.
Please use Python 3.11 for everything: creating the virtualenv, running pip, and installing requirements.

Use 3.11 everywhere

# Verify
python3.11 --version

# Create venv with 3.11
python3.11 -m venv venv
source venv/bin/activate

# Always bind pip to the venv's Python
python -m pip install --upgrade pip
python -m pip install -r requirements.txt

# (If multiple pythons are installed) avoid calling system binaries directly
# BAD:  pip install -r requirements.txt      # may be tied to wrong Python
# BAD:  python3.12 -m pip ...                # wrong interpreter
# GOOD: python -m pip ...                    # uses the venv's Python 3.11

Troubleshooting

  • If you see ERROR: No matching distribution found for packages like aiohttp, double‑check that your active interpreter is 3.11:
    python -c "import sys; print(sys.executable, sys.version)"
  • If your venv was created with a different Python version, recreate it with 3.11:
    rm -rf venv
    python3.11 -m venv venv
    source venv/bin/activate
    python -m pip install -r requirements.txt

2️⃣ Configuration

Configuration is split between the project root and the client directory. Use .env files in the respective directories.

  • .env inside project root directory
LOCAL_LLM_MODEL=gemma3:12b

GOOGLE_GEMINI_API_KEY=...
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...

LOGGING_ENABLED=True

SERVER_HOST=127.0.0.1
SERVER_PORT=8000
  • .env inside client directory
VITE_API_URL=http://127.0.0.1:8000
VITE_WS_URL=ws://127.0.0.1:8000
  • Vector store persists at: engine/data_ingestion/chroma_db/ (Delete to rebuild)

3️⃣ Setup & Run

⚠️ Important

Run the following commands only after setting up your virtual environment and installing all dependencies. Skipping these steps or running them too early may cause errors or incomplete setup.

Database

Before running the system for the first time, you must initialize the vector database with the required course materials and documents.

python -m engine.data_ingestion.db_init

⏳ Heads‑up: This step can take a while.
The ingestion pipeline needs to load, chunk, clean, embed/vectorize, and store all documents into ChromaDB.
Duration depends on dataset size, CPU/GPU, and the embedding model. Seeing logs like these is normal:

[πŸ“‚] Processing folder <subject>...
[πŸ“„] Loading PDF...
[🧱] Chunking...
[🧹] Cleaning...
[πŸ’Ύ] Saving...
[βœ…] Finished processing <subject>

You can safely rerun the command to ingest new or changed files; existing items will be updated/merged as needed.

Engine & Server

python -m server.api

Client

cd client
npm install
npm run dev

Visit https://127.0.0.1:5173


🏑 Using Local LLM models (Optional)

To run the application with local models on your device:

  • You must have Ollama installed.
  • You must have downloaded the models you want to use.
  • Set the LOCAL_LLM_MODEL variable in your .env file to the desired model name, for example:
    LOCAL_LLM_MODEL=gemma3:12b
  • If no LOCAL_LLM_MODEL flag is set, the application will try to use a cloud LLM provider if an API key is found (GOOGLE_GEMINI_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY). If no API key is found it will stop execution.
  • Hardware note: Make sure the models you choose fit within your device's available memory, otherwise performance issues or crashes may occur.

If you do not have Ollama installed or the models downloaded, you can still use cloud LLM providers such as Gemini or ChatGPT (recommended for best performance).


🐳 Running with Docker

The entire application can be run using Docker Compose, which sets up all required services in isolated containers.

  1. Install Docker and Docker Compose.
  2. Create and configure your .env file in the project root with the necessary settings (see .env example).
  3. From the project root, run:
    docker compose up --build
  4. This will:
    • Start the backend
    • Start the frontend
    • (Optionally) connect to Ollama for local LLM usage if LOCAL_LLM_MODEL is set
  5. Access the web application via the URL shown in the terminal output (typically http://localhost:5173).

To stop the containers:

docker compose down

About

Ferdo is a full-stack AI assistant designed to answer student questions using official course materials and student forum posts. It uses Retrieval-Augmented Generation (RAG) to provide grounded, accurate answers, while never making things up.

Resources

Stars

Watchers

Forks

Languages