Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 141 additions & 0 deletions .agents/debugging-backends.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Debugging and Rebuilding Backends

When a backend fails at runtime (e.g. a gRPC method error, a Python import error, or a dependency conflict), use this guide to diagnose, fix, and rebuild.

## Architecture Overview

- **Source directory**: `backend/python/<name>/` (or `backend/go/<name>/`, `backend/cpp/<name>/`)
- **Installed directory**: `backends/<name>/` — this is what LocalAI actually runs. It is populated by `make backends/<name>` which builds a Docker image, exports it, and installs it via `local-ai backends install`.
- **Virtual environment**: `backends/<name>/venv/` — the installed Python venv (for Python backends). The Python binary is at `backends/<name>/venv/bin/python`.

Editing files in `backend/python/<name>/` does **not** affect the running backend until you rebuild with `make backends/<name>`.

## Diagnosing Failures

### 1. Check the logs

Backend gRPC processes log to LocalAI's stdout/stderr. Look for lines tagged with the backend's model ID:

```
GRPC stderr id="trl-finetune-127.0.0.1:37335" line="..."
```

Common error patterns:
- **"Method not implemented"** — the backend is missing a gRPC method that the Go side calls. The model loader (`pkg/model/initializers.go`) always calls `LoadModel` after `Health`; fine-tuning backends must implement it even as a no-op stub.
- **Python import errors / `AttributeError`** — usually a dependency version mismatch (e.g. `pyarrow` removing `PyExtensionType`).
- **"failed to load backend"** — the gRPC process crashed or never started. Check stderr lines for the traceback.

### 2. Test the Python environment directly

You can run the installed venv's Python to check imports without starting the full server:

```bash
backends/<name>/venv/bin/python -c "import datasets; print(datasets.__version__)"
```

If `pip` is missing from the venv, bootstrap it:

```bash
backends/<name>/venv/bin/python -m ensurepip
```

Then use `backends/<name>/venv/bin/python -m pip install ...` to test fixes in the installed venv before committing them to the source requirements.

### 3. Check upstream dependency constraints

When you hit a dependency conflict, check what the main library expects. For example, TRL's upstream `requirements.txt`:

```
https://github.com/huggingface/trl/blob/main/requirements.txt
```

Pin minimum versions in the backend's requirements files to match upstream.

## Common Fixes

### Missing gRPC methods

If the Go side calls a method the backend doesn't implement (e.g. `LoadModel`), add a no-op stub in `backend.py`:

```python
def LoadModel(self, request, context):
"""No-op — actual loading happens elsewhere."""
return backend_pb2.Result(success=True, message="OK")
```

The gRPC contract requires `LoadModel` to succeed for the model loader to return a usable client, even if the backend doesn't need upfront model loading.

### Dependency version conflicts

Python backends often break when a transitive dependency releases a breaking change (e.g. `pyarrow` removing `PyExtensionType`). Steps:

1. Identify the broken import in the logs
2. Test in the installed venv: `backends/<name>/venv/bin/python -c "import <module>"`
3. Check upstream requirements for version constraints
4. Update **all** requirements files in `backend/python/<name>/`:
- `requirements.txt` — base deps (grpcio, protobuf)
- `requirements-cpu.txt` — CPU-specific (includes PyTorch CPU index)
- `requirements-cublas12.txt` — CUDA 12
- `requirements-cublas13.txt` — CUDA 13
5. Rebuild: `make backends/<name>`

### PyTorch index conflicts (uv resolver)

The Docker build uses `uv` for pip installs. When `--extra-index-url` points to the PyTorch wheel index, `uv` may refuse to fetch packages like `requests` from PyPI if it finds a different version on the PyTorch index first. Fix this by adding `--index-strategy=unsafe-first-match` to `install.sh`:

```bash
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
installRequirements
```

Most Python backends already do this — check `backend/python/transformers/install.sh` or similar for reference.

## Rebuilding

### Rebuild a single backend

```bash
make backends/<name>
```

This runs the Docker build (`Dockerfile.python`), exports the image to `backend-images/<name>.tar`, and installs it into `backends/<name>/`. It also rebuilds the `local-ai` Go binary (without extra tags).

**Important**: If you were previously running with `GO_TAGS=auth`, the `make backends/<name>` step will overwrite your binary without that tag. Rebuild the Go binary afterward:

```bash
GO_TAGS=auth make build
```

### Rebuild and restart

After rebuilding a backend, you must restart LocalAI for it to pick up the new backend files. The backend gRPC process is spawned on demand when the model is first loaded.

```bash
# Kill existing process
kill <pid>

# Restart
./local-ai run --debug [your flags]
```

### Quick iteration (skip Docker rebuild)

For fast iteration on a Python backend's `backend.py` without a full Docker rebuild, you can edit the installed copy directly:

```bash
# Edit the installed copy
vim backends/<name>/backend.py

# Restart LocalAI to respawn the gRPC process
```

This is useful for testing but **does not persist** — the next `make backends/<name>` will overwrite it. Always commit fixes to the source in `backend/python/<name>/`.

## Verification

After fixing and rebuilding:

1. Start LocalAI and confirm the backend registers: look for `Registering backend name="<name>"` in the logs
2. Trigger the operation that failed (e.g. start a fine-tuning job)
3. Watch the GRPC stderr/stdout lines for the backend's model ID
4. Confirm no errors in the traceback
39 changes: 39 additions & 0 deletions .github/workflows/backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-cpu-trl'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'true'
backend: "trl"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""
Expand Down Expand Up @@ -366,6 +379,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-trl'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "trl"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "12"
cuda-minor-version: "8"
Expand Down Expand Up @@ -757,6 +783,19 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'cublas'
cuda-major-version: "13"
cuda-minor-version: "0"
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-13-trl'
runs-on: 'ubuntu-latest'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "trl"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'l4t'
cuda-major-version: "13"
cuda-minor-version: "0"
Expand Down
1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ This file is an index to detailed topic guides in the `.agents/` directory. Read
| [.agents/llama-cpp-backend.md](.agents/llama-cpp-backend.md) | Working on the llama.cpp backend — architecture, updating, tool call parsing |
| [.agents/testing-mcp-apps.md](.agents/testing-mcp-apps.md) | Testing MCP Apps (interactive tool UIs) in the React UI |
| [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) | Adding API endpoints, auth middleware, feature permissions, user access control |
| [.agents/debugging-backends.md](.agents/debugging-backends.md) | Debugging runtime backend failures, dependency conflicts, rebuilding backends |

## Quick Reference

Expand Down
8 changes: 6 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Disable parallel execution for backend builds
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl

GOCMD=go
GOTEST=$(GOCMD) test
Expand Down Expand Up @@ -421,6 +421,7 @@ prepare-test-extra: protogen-python
$(MAKE) -C backend/python/voxcpm
$(MAKE) -C backend/python/whisperx
$(MAKE) -C backend/python/ace-step
$(MAKE) -C backend/python/trl

test-extra: prepare-test-extra
$(MAKE) -C backend/python/transformers test
Expand All @@ -440,6 +441,7 @@ test-extra: prepare-test-extra
$(MAKE) -C backend/python/voxcpm test
$(MAKE) -C backend/python/whisperx test
$(MAKE) -C backend/python/ace-step test
$(MAKE) -C backend/python/trl test

DOCKER_IMAGE?=local-ai
IMAGE_TYPE?=core
Expand Down Expand Up @@ -572,6 +574,7 @@ BACKEND_VOXCPM = voxcpm|python|.|false|true
BACKEND_WHISPERX = whisperx|python|.|false|true
BACKEND_ACE_STEP = ace-step|python|.|false|true
BACKEND_MLX_DISTRIBUTED = mlx-distributed|python|./|false|true
BACKEND_TRL = trl|python|.|false|true

# Helper function to build docker image for a backend
# Usage: $(call docker-build-backend,BACKEND_NAME,DOCKERFILE_TYPE,BUILD_CONTEXT,PROGRESS_FLAG,NEEDS_BACKEND_ARG)
Expand Down Expand Up @@ -629,12 +632,13 @@ $(eval $(call generate-docker-build-target,$(BACKEND_WHISPERX)))
$(eval $(call generate-docker-build-target,$(BACKEND_ACE_STEP)))
$(eval $(call generate-docker-build-target,$(BACKEND_ACESTEP_CPP)))
$(eval $(call generate-docker-build-target,$(BACKEND_MLX_DISTRIBUTED)))
$(eval $(call generate-docker-build-target,$(BACKEND_TRL)))

# Pattern rule for docker-save targets
docker-save-%: backend-images
docker save local-ai-backend:$* -o backend-images/$*.tar

docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed
docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl

########################################################
### Mock Backend for E2E Tests
Expand Down
109 changes: 109 additions & 0 deletions backend/backend.proto
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,13 @@ service Backend {
rpc AudioDecode(AudioDecodeRequest) returns (AudioDecodeResult) {}

rpc ModelMetadata(ModelOptions) returns (ModelMetadataResponse) {}

// Fine-tuning RPCs
rpc StartFineTune(FineTuneRequest) returns (FineTuneJobResult) {}
rpc FineTuneProgress(FineTuneProgressRequest) returns (stream FineTuneProgressUpdate) {}
rpc StopFineTune(FineTuneStopRequest) returns (Result) {}
rpc ListCheckpoints(ListCheckpointsRequest) returns (ListCheckpointsResponse) {}
rpc ExportModel(ExportModelRequest) returns (Result) {}
}

// Define the empty request
Expand Down Expand Up @@ -528,3 +535,105 @@ message ModelMetadataResponse {
string rendered_template = 2; // The rendered chat template with enable_thinking=true (empty if not applicable)
ToolFormatMarkers tool_format = 3; // Auto-detected tool format markers from differential template analysis
}

// Fine-tuning messages

message FineTuneRequest {
// Model identification
string model = 1; // HF model name or local path
string training_type = 2; // "lora", "loha", "lokr", "full" — what parameters to train
string training_method = 3; // "sft", "dpo", "grpo", "rloo", "reward", "kto", "orpo", "network_training"

// Adapter config (universal across LoRA/LoHa/LoKr for LLM + diffusion)
int32 adapter_rank = 10; // LoRA rank (r), default 16
int32 adapter_alpha = 11; // scaling factor, default 16
float adapter_dropout = 12; // default 0.0
repeated string target_modules = 13; // layer names to adapt

// Universal training hyperparameters
float learning_rate = 20; // default 2e-4
int32 num_epochs = 21; // default 3
int32 batch_size = 22; // default 2
int32 gradient_accumulation_steps = 23; // default 4
int32 warmup_steps = 24; // default 5
int32 max_steps = 25; // 0 = use epochs
int32 save_steps = 26; // 0 = only save final
float weight_decay = 27; // default 0.01
bool gradient_checkpointing = 28;
string optimizer = 29; // adamw_8bit, adamw, sgd, adafactor, prodigy
int32 seed = 30; // default 3407
string mixed_precision = 31; // fp16, bf16, fp8, no

// Dataset
string dataset_source = 40; // HF dataset ID, local file/dir path
string dataset_split = 41; // train, test, etc.

// Output
string output_dir = 50;
string job_id = 51; // client-assigned or auto-generated

// Resume training from a checkpoint
string resume_from_checkpoint = 55; // path to checkpoint dir to resume from

// Backend-specific AND method-specific extensibility
map<string, string> extra_options = 60;
}

message FineTuneJobResult {
string job_id = 1;
bool success = 2;
string message = 3;
}

message FineTuneProgressRequest {
string job_id = 1;
}

message FineTuneProgressUpdate {
string job_id = 1;
int32 current_step = 2;
int32 total_steps = 3;
float current_epoch = 4;
float total_epochs = 5;
float loss = 6;
float learning_rate = 7;
float grad_norm = 8;
float eval_loss = 9;
float eta_seconds = 10;
float progress_percent = 11;
string status = 12; // queued, caching, loading_model, loading_dataset, training, saving, completed, failed, stopped
string message = 13;
string checkpoint_path = 14; // set when a checkpoint is saved
string sample_path = 15; // set when a sample is generated (video/image backends)
map<string, float> extra_metrics = 16; // method-specific metrics
}

message FineTuneStopRequest {
string job_id = 1;
bool save_checkpoint = 2;
}

message ListCheckpointsRequest {
string output_dir = 1;
}

message ListCheckpointsResponse {
repeated CheckpointInfo checkpoints = 1;
}

message CheckpointInfo {
string path = 1;
int32 step = 2;
float epoch = 3;
float loss = 4;
string created_at = 5;
}

message ExportModelRequest {
string checkpoint_path = 1;
string output_path = 2;
string export_format = 3; // lora, loha, lokr, merged_16bit, merged_4bit, gguf, diffusers
string quantization_method = 4; // for GGUF: q4_k_m, q5_k_m, q8_0, f16, etc.
string model = 5; // base model name (for merge operations)
map<string, string> extra_options = 6;
}
Loading
Loading