ZoeAI is a modern and user-friendly local AI implementation designed to provide a seamless interface for interacting with local AI models. Divided between the API and the web dashboard as frontend, it currently has the following capabilities:
- Thinking mode for step-by-step reasoning
- Markdown support for rich content formatting
- Conversation history support
- Customizable settings
- Context menu for conversation management
- Plugin Support for integrations and better feature support
The interface prioritizes user privacy by keeping all data local and ethical AI practices through transparent thinking processes.
-
CPU with AVX2 instruction support
-
8GB RAM (I recommend 16GB or more for larger models)
-
Nvidia GPU with a minimum of 8GB of VRAM to run larger models (optional)
-
Web browser
-
Python
The application is both client-side and server-side.
After cloning this repository, we must create a virtual environment (optional, but recommended) and install the dependencies
git clone https://github.com/sandroXP2007/zoe
cd zoe/backend
mkdir models plugins
./download_model
python -m venv venv (recommended)
source venv/bin/activate
pip install -r requirements.txtOnce the environment is configured, you can download the LLM models in the models/ folder or use the pre-configured model (Qwen3-4B quantized q4_k_s)
After the installation process is finished, we can start the API server
python server.pyOption A: Direct browser access (for local-only)
- Open
frontend/index.htmlin your browser
Option B: HTTP server (to share on the local network or on the web)
- Enter the
frontend/folder and start the server
sudo python -m http.server 80Then, just open the browser on localhost.
- Open the web interface
- Click Settings
- Set your model path and parameters
- Click Save
- Start a new conversation with the "New Chat" button
- Type messages in the input field
- Toggle thinking mode with the brain icon
- Manage conversations via right-click menu (rename/delete)
- Monitor performance with the token speed indicator
We also have a CLI client for testing purposes, and you can use it by going to the backend/ folder and starting the client.
source venv/bin/activate (if you have set up a virtual environment)
python client.py| Endpoint | Method | Description |
|---|---|---|
/chat |
POST | Process chat messages with streaming |
/speed |
GET | Get current token generation speed |
/thinking |
POST | Toggle thinking mode |
/thinking |
GET | Check thinking mode status |
/config |
GET | Retrieve current configuration |
/config |
POST | Update model configuration |
/model/reload |
POST | Reload the AI model |
/cancel |
POST | Stop current generation |
{
"model": {
"path": "models/qwen3-4b-q4_k_s.gguf",
"n_ctx": 2048,
"temperature": 0.7
},
"system": {
"prompt": "You are Zoe, a helpful AI assistant."
}
}- Open Settings (cog icon)
- Enter the new model path (e.g.,
models/mistral-7b.gguf) - Click Save
- Click Reload to apply changes
- Check token speed indicator
- Ensure your local server has adequate resources
- Verify model compatibility with your hardware
- Reduce
n_ctxvalue in settings for faster responses
Currently, conversations are stored locally in browser storage.
If you want to export your conversations, you can do so by following these steps:
- Go to Settings → Export Data
- Copy the JSON data
- Save to a file
Thanks to other promising projects such as Qwen (by our project's default model, Qwen3-4B), llama.cpp and llama-cpp-python.