A local HTTP server that powers the Moly app by providing capabilities for searching, downloading, and running local Large Language Models (LLMs). This server integrates with WasmEdge for model execution and provides an OpenAI-compatible API interface.
- Search and discover LLM models
- Download and manage model files
- Automatic mirror selection based on region
- Run local LLMs using WasmEdge runtime
- OpenAI-compatible API interface
-
Obtain the source code for this repository:
git clone https://github.com/moly-ai/moly-local.git- Follow the platform-specific instructions below.
Install the required WasmEdge WASM runtime:
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s -- --version=0.14.1
source $HOME/.wasmedge/envThen use cargo to build and run the server:
cd moly-local
cargo run -p moly-localInstall the required WasmEdge WASM runtime:
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s -- --version=0.14.1
source $HOME/.wasmedge/envImportant
If your CPU does not support AVX512, then you should append the --noavx option onto the above command.
To build Moly on Linux, you must install the following dependencies: openssl,
clang/libclang, binfmt. On a Debian-like Linux distro (e.g., Ubuntu), run
the following:
sudo apt-get update
sudo apt-get install libssl-dev pkg-config llvm clang libclang-dev binfmt-supportThen use cargo to build and run Moly Local:
cd moly-local
cargo run -p moly-local-
Install the required WasmEdge WASM runtime from the WasmEdge releases page:
WasmEdge-0.14.1-windows.msi -
Download and extract the appropriate WASI-NN/GGML plugin for your system:
- For CUDA 11/12:
WasmEdge-plugin-wasi_nn-ggml-cuda-0.14.1-windows-x86_64.zip - For CPUs with AVX512 support:
WasmEdge-plugin-wasi_nn-ggml-0.14.1-windows-x86_64.zip - Otherwise:
WasmEdge-plugin-wasi_nn-ggml-noavx-0.14.1-windows-x86_64.zip
- For CUDA 11/12:
-
Copy the plugin DLL from that archive
.\lib\wasmedge\wasmedgePluginWasiNN.dlltoProgram Files\WasmEdge\lib\ -
Then use
cargoto build and run Moly Local:cd moly-local cargo run -p moly-local
To run the server locally:
cargo run -p moly-localThe server will start on the configured port (default: 8765) and log its address.
The server can be configured using the following environment variables:
MOLY_LOCAL_PORT: Port number for the HTTP server (default: 8765)MODEL_CARDS_REPO: Custom repository URL for model cardsMOLY_API_SERVER_ADDR: Custom address for the API server (default: localhost:0)
GET /files- List all downloaded filesDELETE /files/{id}- Delete a specific file
GET /downloads- List all current downloadsPOST /downloads- Start a new downloadGET /downloads/{id}/progress- Get download progressPOST /downloads/{id}- Pause a downloadDELETE /downloads/{id}- Cancel a download
POST /models/load- Load a modelPOST /models/eject- Eject the currently loaded modelGET /models/featured- Get featured modelsGET /models/search- Search for modelsPOST /models/v1/chat/completions- Chat completions endpoint (OpenAI-compatible)