AI-Toolkit by ostris — a web-based interface for training LoRA adapters on diffusion models.
The manifest at ai-toolkit-k8s.yaml creates:
- Namespace:
ai-toolkit - Deployment: Single replica running
ostris/aitoolkit:latest - Service: NodePort exposing the web UI on port 30675
The pod is pinned to the RTX 5090 (32 GB VRAM) via NVIDIA_VISIBLE_DEVICES UUID targeting. This is the only GPU with enough VRAM for training larger models. See GPU Assignment for the strategy.
| Variable | Value | Notes |
|---|---|---|
AI_TOOLKIT_AUTH |
YOUR_WEBUI_PASSWORD |
Replace with your password |
HF_TOKEN |
YOUR_HF_TOKEN_HERE |
Replace with your HuggingFace token |
HF_HOME |
/workspace/hf_cache |
Cache directory for HF models |
NVIDIA_VISIBLE_DEVICES |
GPU-01b9a... |
RTX 5090 UUID |
Uses a hostPath volume at /data/ai-toolkit mounted to /workspace inside the container. This persists training configs, datasets, outputs, and cached models across pod restarts.
resources:
limits:
nvidia.com/gpu: "1"
memory: "30Gi"
requests:
nvidia.com/gpu: "1"
memory: "8Gi"The dshm (shared memory) volume is mounted at /dev/shm with a 1 GiB limit — required for PyTorch DataLoader workers.
- Replace
YOUR_HF_TOKEN_HEREwith your HuggingFace token - Replace
YOUR_WEBUI_PASSWORDwith a strong password - Ensure
/data/ai-toolkitexists on the host - Verify the GPU UUID matches your target GPU (
nvidia-smi -L)
kubectl apply -f ai-toolkit-k8s.yaml