(Local AI Chat + Document ETL + Model Management Platform)
This project is a scalable LLM backend designed to run fully locally using:
- Ollama (local LLM runtime)
- ChromaDB (vector database)
- RabbitMQ (async ETL pipeline)
- Spring Boot (API + orchestration)
- Local filesystem for file storage (S3 replacement planned)
It supports chat, conversation memory, document ingestion, chunk embedding, and dynamic model management through Ollama's HTTP API — all architected in a clean, maintainable, and scalable way.
This is not a toy demo — it is built with enterprise-grade patterns, clean separation of concerns, and future cloud migration in mind.
- Supports any Ollama model installed locally
- Maintains conversation memory per session ID
- Clean abstraction through
ChatService - Model can be selected per request
- No business logic inside controllers — all logic lives in services
- Upload any file type (PDF, DOCX, TXT, HTML, etc.)
- Files stored locally under
data/uploads/ - Storage abstraction through
FileStorageService - Ready to be swapped for Amazon S3 in the future with zero changes to controllers
When a file is uploaded:
- File is saved locally
- Path is pushed to RabbitMQ
- Background
EtlWorkerpicks it up - Tika extracts text
- Text is chunked using a TokenTextSplitter
- Chunks are embedded
- Chunks are persisted into Chroma vector store
This allows uploads to be instant while large files get processed in the background.
- Stores embeddings for all processed document chunks
- Ready for intelligent RAG retrieval
- Simple and future-proof — can be replaced by Pinecone/Weaviate later
Handles real model installation workflow:
- Pull models using Ollama’s HTTP API
- Check if a model is already installed
- Track install progress (0–100%)
- Support parallel installs with a configurable limit
- Cancel running installs
- Persist and query model install state
- Clean separation via
ModelManagementService
This lets your app behave like a real AI platform — not a static toy.
- Retrieve paginated conversation messages
- Stored in ChatMemory per session ID
- Lightweight and fast
The project intentionally avoids the anti-patterns that plague 99% of LLM backend code:
- No controller bloating
- No long-running processes inside controllers
- No inline shell commands
- No state stored in controllers
- No duplicated ETL logic
- No single god-class
Every concern lives in the correct layer — period.
controller/
ChatController
ModelController
UploadController
SessionController
service/
ChatService
ModelManagementService
FileStorageService
etl/
EtlWorker
EtlMessagePublisher
config/
AiConfiguration
RabbitConfig
ExecutorConfig
ChromaConfig
Chat Flow
Request → ChatController → ChatService → ChatClient/Ollama → Response
Document ETL Flow
UploadController → FileStorageService → RabbitMQ → EtlWorker →
Tika → Chunking → Embedding → Chroma
Model Management Flow
ModelController → ModelManagementService → Ollama HTTP API → Progress Registry
- Java 21
- Spring Boot
- Spring AI
- Spring AMQP
- Spring Web
- Spring Validation
- Ollama (local LLM engine)
- ChromaDB
- RabbitMQ
- Apache Tika
- TokenTextSplitter
- Local filesystem (S3 planned)
- Replace RabbitMQ with Kafka for distributed ETL
- Replace local FS with S3 for scalable file storage
- Add RAG search endpoint
- Add websocket streaming for chat
- Add fine-grained model permissions
- Add auto-model-download based on usage patterns
- Java 21
- Ollama installed
- RabbitMQ running
- Chroma running
./mvnw spring-boot:run
Upload files via:
POST /upload
Chat via:
POST /chat/inference
Manage models via:
POST /models/install
GET /models/status/{model}
POST /models/cancel/{model}