mudler · mudler · Mar 21, 2026 · Mar 20, 2026 · Mar 20, 2026 · Mar 20, 2026
diff --git a/.agents/debugging-backends.md b/.agents/debugging-backends.md
@@ -0,0 +1,141 @@
+# Debugging and Rebuilding Backends
+
+When a backend fails at runtime (e.g. a gRPC method error, a Python import error, or a dependency conflict), use this guide to diagnose, fix, and rebuild.
+
+## Architecture Overview
+
+- **Source directory**: `backend/python/<name>/` (or `backend/go/<name>/`, `backend/cpp/<name>/`)
+- **Installed directory**: `backends/<name>/` — this is what LocalAI actually runs. It is populated by `make backends/<name>` which builds a Docker image, exports it, and installs it via `local-ai backends install`.
+- **Virtual environment**: `backends/<name>/venv/` — the installed Python venv (for Python backends). The Python binary is at `backends/<name>/venv/bin/python`.
+
+Editing files in `backend/python/<name>/` does **not** affect the running backend until you rebuild with `make backends/<name>`.
+
+## Diagnosing Failures
+
+### 1. Check the logs
+
+Backend gRPC processes log to LocalAI's stdout/stderr. Look for lines tagged with the backend's model ID:
+
+```
+GRPC stderr id="trl-finetune-127.0.0.1:37335" line="..."
+```
+
+Common error patterns:
+- **"Method not implemented"** — the backend is missing a gRPC method that the Go side calls. The model loader (`pkg/model/initializers.go`) always calls `LoadModel` after `Health`; fine-tuning backends must implement it even as a no-op stub.
+- **Python import errors / `AttributeError`** — usually a dependency version mismatch (e.g. `pyarrow` removing `PyExtensionType`).
+- **"failed to load backend"** — the gRPC process crashed or never started. Check stderr lines for the traceback.
+
+### 2. Test the Python environment directly
+
+You can run the installed venv's Python to check imports without starting the full server:
+
+```bash
+backends/<name>/venv/bin/python -c "import datasets; print(datasets.__version__)"
+```
+
+If `pip` is missing from the venv, bootstrap it:
+
+```bash
+backends/<name>/venv/bin/python -m ensurepip
+```
+
+Then use `backends/<name>/venv/bin/python -m pip install ...` to test fixes in the installed venv before committing them to the source requirements.
+
+### 3. Check upstream dependency constraints
+
+When you hit a dependency conflict, check what the main library expects. For example, TRL's upstream `requirements.txt`:
+
+```
+https://github.com/huggingface/trl/blob/main/requirements.txt
+```
+
+Pin minimum versions in the backend's requirements files to match upstream.
+
+## Common Fixes
+
+### Missing gRPC methods
+
+If the Go side calls a method the backend doesn't implement (e.g. `LoadModel`), add a no-op stub in `backend.py`:
+
+```python
+def LoadModel(self, request, context):
+    """No-op — actual loading happens elsewhere."""
+    return backend_pb2.Result(success=True, message="OK")
+```
+
+The gRPC contract requires `LoadModel` to succeed for the model loader to return a usable client, even if the backend doesn't need upfront model loading.
+
+### Dependency version conflicts
+
+Python backends often break when a transitive dependency releases a breaking change (e.g. `pyarrow` removing `PyExtensionType`). Steps:
+
+1. Identify the broken import in the logs
+2. Test in the installed venv: `backends/<name>/venv/bin/python -c "import <module>"`
+3. Check upstream requirements for version constraints
+4. Update **all** requirements files in `backend/python/<name>/`:
+   - `requirements.txt` — base deps (grpcio, protobuf)
+   - `requirements-cpu.txt` — CPU-specific (includes PyTorch CPU index)
+   - `requirements-cublas12.txt` — CUDA 12
+   - `requirements-cublas13.txt` — CUDA 13
+5. Rebuild: `make backends/<name>`
+
+### PyTorch index conflicts (uv resolver)
+
+The Docker build uses `uv` for pip installs. When `--extra-index-url` points to the PyTorch wheel index, `uv` may refuse to fetch packages like `requests` from PyPI if it finds a different version on the PyTorch index first. Fix this by adding `--index-strategy=unsafe-first-match` to `install.sh`:
+
+```bash
+EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
+installRequirements
+```
+
+Most Python backends already do this — check `backend/python/transformers/install.sh` or similar for reference.
+
+## Rebuilding
+
+### Rebuild a single backend
+
+```bash
+make backends/<name>
+```
+
+This runs the Docker build (`Dockerfile.python`), exports the image to `backend-images/<name>.tar`, and installs it into `backends/<name>/`. It also rebuilds the `local-ai` Go binary (without extra tags).
+
+**Important**: If you were previously running with `GO_TAGS=auth`, the `make backends/<name>` step will overwrite your binary without that tag. Rebuild the Go binary afterward:
+
+```bash
+GO_TAGS=auth make build
+```
+
+### Rebuild and restart
+
+After rebuilding a backend, you must restart LocalAI for it to pick up the new backend files. The backend gRPC process is spawned on demand when the model is first loaded.
+
+```bash
+# Kill existing process
+kill <pid>
+
+# Restart
+./local-ai run --debug [your flags]
+```
+
+### Quick iteration (skip Docker rebuild)
+
+For fast iteration on a Python backend's `backend.py` without a full Docker rebuild, you can edit the installed copy directly:
+
+```bash
+# Edit the installed copy
+vim backends/<name>/backend.py
+
+# Restart LocalAI to respawn the gRPC process
+```
+
+This is useful for testing but **does not persist** — the next `make backends/<name>` will overwrite it. Always commit fixes to the source in `backend/python/<name>/`.
+
+## Verification
+
+After fixing and rebuilding:
+
+1. Start LocalAI and confirm the backend registers: look for `Registering backend name="<name>"` in the logs
+2. Trigger the operation that failed (e.g. start a fine-tuning job)
+3. Watch the GRPC stderr/stdout lines for the backend's model ID
+4. Confirm no errors in the traceback
diff --git a/.github/workflows/backend.yml b/.github/workflows/backend.yml
@@ -118,6 +118,19 @@ jobs:
             dockerfile: "./backend/Dockerfile.python"
             context: "./"
             ubuntu-version: '2404'
+          - build-type: ''
+            cuda-major-version: ""
+            cuda-minor-version: ""
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-cpu-trl'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:24.04"
+            skip-drivers: 'true'
+            backend: "trl"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./"
+            ubuntu-version: '2404'
           - build-type: ''
             cuda-major-version: ""
             cuda-minor-version: ""
@@ -366,6 +379,19 @@ jobs:
             dockerfile: "./backend/Dockerfile.python"
             context: "./"
             ubuntu-version: '2404'
+          - build-type: 'cublas'
+            cuda-major-version: "12"
+            cuda-minor-version: "8"
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-gpu-nvidia-cuda-12-trl'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:24.04"
+            skip-drivers: 'false'
+            backend: "trl"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./"
+            ubuntu-version: '2404'
           - build-type: 'cublas'
             cuda-major-version: "12"
             cuda-minor-version: "8"
@@ -757,6 +783,19 @@ jobs:
             dockerfile: "./backend/Dockerfile.python"
             context: "./"
             ubuntu-version: '2404'
+          - build-type: 'cublas'
+            cuda-major-version: "13"
+            cuda-minor-version: "0"
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-gpu-nvidia-cuda-13-trl'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:24.04"
+            skip-drivers: 'false'
+            backend: "trl"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./"
+            ubuntu-version: '2404'
           - build-type: 'l4t'
             cuda-major-version: "13"
             cuda-minor-version: "0"

diff --git a/AGENTS.md b/AGENTS.md
@@ -12,6 +12,7 @@ This file is an index to detailed topic guides in the `.agents/` directory. Read
 | [.agents/llama-cpp-backend.md](.agents/llama-cpp-backend.md) | Working on the llama.cpp backend — architecture, updating, tool call parsing |
 | [.agents/testing-mcp-apps.md](.agents/testing-mcp-apps.md) | Testing MCP Apps (interactive tool UIs) in the React UI |
 | [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) | Adding API endpoints, auth middleware, feature permissions, user access control |
+| [.agents/debugging-backends.md](.agents/debugging-backends.md) | Debugging runtime backend failures, dependency conflicts, rebuilding backends |
 
 ## Quick Reference
 

diff --git a/Makefile b/Makefile
@@ -1,5 +1,5 @@
 # Disable parallel execution for backend builds
-.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus
+.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl
 
 GOCMD=go
 GOTEST=$(GOCMD) test
@@ -421,6 +421,7 @@ prepare-test-extra: protogen-python
 	$(MAKE) -C backend/python/voxcpm
 	$(MAKE) -C backend/python/whisperx
 	$(MAKE) -C backend/python/ace-step
+	$(MAKE) -C backend/python/trl
 
 test-extra: prepare-test-extra
 	$(MAKE) -C backend/python/transformers test
@@ -440,6 +441,7 @@ test-extra: prepare-test-extra
 	$(MAKE) -C backend/python/voxcpm test
 	$(MAKE) -C backend/python/whisperx test
 	$(MAKE) -C backend/python/ace-step test
+	$(MAKE) -C backend/python/trl test
 
 DOCKER_IMAGE?=local-ai
 IMAGE_TYPE?=core
@@ -572,6 +574,7 @@ BACKEND_VOXCPM = voxcpm|python|.|false|true
 BACKEND_WHISPERX = whisperx|python|.|false|true
 BACKEND_ACE_STEP = ace-step|python|.|false|true
 BACKEND_MLX_DISTRIBUTED = mlx-distributed|python|./|false|true
+BACKEND_TRL = trl|python|.|false|true
 
 # Helper function to build docker image for a backend
 # Usage: $(call docker-build-backend,BACKEND_NAME,DOCKERFILE_TYPE,BUILD_CONTEXT,PROGRESS_FLAG,NEEDS_BACKEND_ARG)
@@ -629,12 +632,13 @@ $(eval $(call generate-docker-build-target,$(BACKEND_WHISPERX)))
 $(eval $(call generate-docker-build-target,$(BACKEND_ACE_STEP)))
 $(eval $(call generate-docker-build-target,$(BACKEND_ACESTEP_CPP)))
 $(eval $(call generate-docker-build-target,$(BACKEND_MLX_DISTRIBUTED)))
+$(eval $(call generate-docker-build-target,$(BACKEND_TRL)))
 
 # Pattern rule for docker-save targets
 docker-save-%: backend-images
 	docker save local-ai-backend:$* -o backend-images/$*.tar
 
-docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed
+docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl
 
 ########################################################
 ### Mock Backend for E2E Tests

diff --git a/backend/backend.proto b/backend/backend.proto
@@ -39,6 +39,13 @@ service Backend {
   rpc AudioDecode(AudioDecodeRequest) returns (AudioDecodeResult) {}
 
   rpc ModelMetadata(ModelOptions) returns (ModelMetadataResponse) {}
+
+  // Fine-tuning RPCs
+  rpc StartFineTune(FineTuneRequest) returns (FineTuneJobResult) {}
+  rpc FineTuneProgress(FineTuneProgressRequest) returns (stream FineTuneProgressUpdate) {}
+  rpc StopFineTune(FineTuneStopRequest) returns (Result) {}
+  rpc ListCheckpoints(ListCheckpointsRequest) returns (ListCheckpointsResponse) {}
+  rpc ExportModel(ExportModelRequest) returns (Result) {}
 }
 
 // Define the empty request
@@ -528,3 +535,105 @@ message ModelMetadataResponse {
   string rendered_template = 2;  // The rendered chat template with enable_thinking=true (empty if not applicable)
   ToolFormatMarkers tool_format = 3;  // Auto-detected tool format markers from differential template analysis
 }
+
+// Fine-tuning messages
+
+message FineTuneRequest {
+  // Model identification
+  string model = 1;                       // HF model name or local path
+  string training_type = 2;              // "lora", "loha", "lokr", "full" — what parameters to train
+  string training_method = 3;            // "sft", "dpo", "grpo", "rloo", "reward", "kto", "orpo", "network_training"
+
+  // Adapter config (universal across LoRA/LoHa/LoKr for LLM + diffusion)
+  int32 adapter_rank = 10;               // LoRA rank (r), default 16
+  int32 adapter_alpha = 11;              // scaling factor, default 16
+  float adapter_dropout = 12;            // default 0.0
+  repeated string target_modules = 13;   // layer names to adapt
+
+  // Universal training hyperparameters
+  float learning_rate = 20;              // default 2e-4
+  int32 num_epochs = 21;                 // default 3
+  int32 batch_size = 22;                 // default 2
+  int32 gradient_accumulation_steps = 23; // default 4
+  int32 warmup_steps = 24;              // default 5
+  int32 max_steps = 25;                 // 0 = use epochs
+  int32 save_steps = 26;               // 0 = only save final
+  float weight_decay = 27;             // default 0.01
+  bool gradient_checkpointing = 28;
+  string optimizer = 29;               // adamw_8bit, adamw, sgd, adafactor, prodigy
+  int32 seed = 30;                     // default 3407
+  string mixed_precision = 31;        // fp16, bf16, fp8, no
+
+  // Dataset
+  string dataset_source = 40;          // HF dataset ID, local file/dir path
+  string dataset_split = 41;           // train, test, etc.
+
+  // Output
+  string output_dir = 50;
+  string job_id = 51;                  // client-assigned or auto-generated
+
+  // Resume training from a checkpoint
+  string resume_from_checkpoint = 55;  // path to checkpoint dir to resume from
+
+  // Backend-specific AND method-specific extensibility
+  map<string, string> extra_options = 60;
+}
+
+message FineTuneJobResult {
+  string job_id = 1;
+  bool success = 2;
+  string message = 3;
+}
+
+message FineTuneProgressRequest {
+  string job_id = 1;
+}
+
+message FineTuneProgressUpdate {
+  string job_id = 1;
+  int32 current_step = 2;
+  int32 total_steps = 3;
+  float current_epoch = 4;
+  float total_epochs = 5;
+  float loss = 6;
+  float learning_rate = 7;
+  float grad_norm = 8;
+  float eval_loss = 9;
+  float eta_seconds = 10;
+  float progress_percent = 11;
+  string status = 12;                  // queued, caching, loading_model, loading_dataset, training, saving, completed, failed, stopped
+  string message = 13;
+  string checkpoint_path = 14;        // set when a checkpoint is saved
+  string sample_path = 15;           // set when a sample is generated (video/image backends)
+  map<string, float> extra_metrics = 16; // method-specific metrics
+}
+
+message FineTuneStopRequest {
+  string job_id = 1;
+  bool save_checkpoint = 2;
+}
+
+message ListCheckpointsRequest {
+  string output_dir = 1;
+}
+
+message ListCheckpointsResponse {
+  repeated CheckpointInfo checkpoints = 1;
+}
+
+message CheckpointInfo {
+  string path = 1;
+  int32 step = 2;
+  float epoch = 3;
+  float loss = 4;
+  string created_at = 5;
+}
+
+message ExportModelRequest {
+  string checkpoint_path = 1;
+  string output_path = 2;
+  string export_format = 3;           // lora, loha, lokr, merged_16bit, merged_4bit, gguf, diffusers
+  string quantization_method = 4;     // for GGUF: q4_k_m, q5_k_m, q8_0, f16, etc.
+  string model = 5;                   // base model name (for merge operations)
+  map<string, string> extra_options = 6;
+}