A lightweight local LLM chat with a web UI and a C‑based server that runs any LLM chat executable as a child and communicates via pipes.
- General Information
- Technologies Used
- Features
- Screenshots
- Setup
- Usage
- Project Status
- Room for Improvement
- Acknowledgements
- Contact
- License
LLMux makes running a local LLM chat easier by providing Tailwind‑powered web UI + a minimal C server that simply spawns any compatible chat executable and talks to it over UNIX pipes. Everything runs on your machine — no third‑party services — so you retain full privacy and control. LLMux is good for:
- Privacy‑conscious users who want a self‑hosted, browser‑based chat interface.
- Developers who need to prototype a chat front‑end around a custom model without writing HTTP or JavaScript plumbing from scratch.
- llama.cpp — tag
b5391 - CivetWeb — commit
85d361d85dd3992bf5aaa04a392bc58ce655ad9d - Tailwind CSS —
v3.4.16 - C++ 17 for the example chat executable
- GNU Make / Bash for build orchestration
- Browser‑based chat UI served by a tiny C HTTP server
- Pluggable LLM chat executable — just point at any compatible binary
- Configurable model name, context length, server port and max response length via
#defineinserver.candllm.cpp - Build script (
build.sh) to compile everything intoout/and runclang-formaton sources
- Obtain a model compatible with
llama.cpp( e.g. a.gguffile ) and place it in themodels/directory. - ( Optional ) If you don't use the example C++ chat app (
llm_chatakallm.cpp), update itsLLM_CHAT_EXECUTABLE_NAMEto match your chosen binary. - Get llama.cpp and CivetWeb.
- Run:
./build.sh
This will:
- Compile the C server and C++ example chat app
- Place all outputs under
out/ - Format the source files with
clang-format
- In
out/, set theLLM_CHAT_EXECUTABLE_NAMEmacro inserver.cto your chat binary name and re‑build if needed. - Start the server:
./out/server
- Note the printed port number ( e.g.
Server started on port 8080). - Open your browser at
http://localhost:<port>to start chatting.
Project is complete.All planned functionality — spawning the LLM, piping I/O, rendering a chat UI — is implemented.
To do:
- Dynamic response buffer: Switch from fixed buffers to dynamic allocation in
server.c. - Prompt unescape: Properly unescape JSON‑style sequences (
\",\\\, etc. ) in incoming prompts before forwarding.
- Inspired by the simple‑chat example in llama.cpp
Created by @lurkydismal - feel free to contact me!
This project is open source and available under the GNU Affero General Public License v3.0.

