| layout | title | nav_order | has_children |
|---|---|---|---|
default |
LocalAI Tutorial |
92 |
true |
Run LLMs, image generation, and audio models locally with an OpenAI-compatible API.
LocalAIView Repo is a free, open-source alternative to OpenAI that runs locally. It provides an OpenAI-compatible API for LLMs, image generation, audio transcription, and text-to-speech—all running on consumer hardware.
| Feature | Description |
|---|---|
| OpenAI Compatible | Drop-in replacement for OpenAI API |
| Multi-Modal | Text, images, audio, embeddings |
| No GPU Required | Runs on CPU (GPU optional) |
| Model Gallery | Easy model installation |
| Docker Ready | Simple deployment |
| Privacy | 100% local, no data leaves |
flowchart TD
A[OpenAI SDK/API Calls] --> B[LocalAI Server]
B --> C[LLM Backend]
B --> D[Image Generation]
B --> E[Audio Processing]
B --> F[Embeddings]
C --> G[llama.cpp]
C --> H[GPT4All]
D --> I[Stable Diffusion]
D --> J[SDXL]
E --> K[Whisper]
E --> L[TTS]
F --> M[Sentence Transformers]
classDef api fill:#e1f5fe,stroke:#01579b
classDef server fill:#f3e5f5,stroke:#4a148c
classDef backend fill:#fff3e0,stroke:#ef6c00
classDef model fill:#e8f5e8,stroke:#1b5e20
class A api
class B server
class C,D,E,F backend
class G,H,I,J,K,L,M model
- repository:
mudler/LocalAI - stars: about 43.7k
- latest release:
v4.0.0(published 2026-03-14)
- Chapter 1: Getting Started - Installation and first model
- Chapter 2: Model Gallery - Installing and managing models
- Chapter 3: Text Generation - Chat and completions
- Chapter 4: Image Generation - Stable Diffusion locally
- Chapter 5: Audio - Whisper transcription and TTS
- Chapter 6: Embeddings - Vector embeddings for RAG
- Chapter 7: Configuration - Advanced settings and tuning
- Chapter 8: Integrations - Production integrations and optimization
- Deploy LocalAI with Docker or from source
- Install Models from the gallery
- Use OpenAI SDK with local models
- Generate Images with Stable Diffusion
- Transcribe Audio with Whisper
- Create Embeddings for RAG applications
- Scale for Production use
- Docker (recommended)
- 8GB+ RAM (more for larger models)
- Optional: NVIDIA GPU with CUDA
# Run LocalAI
docker run -p 8080:8080 \
-v localai-models:/models \
localai/localai:latest-cpu
# Open http://localhost:8080docker run -p 8080:8080 \
--gpus all \
-v localai-models:/models \
localai/localai:latest-gpu-nvidia-cuda-12version: '3.8'
services:
localai:
image: localai/localai:latest-cpu
ports:
- "8080:8080"
volumes:
- ./models:/models
environment:
- DEBUG=true
- THREADS=4# Via API
curl http://localhost:8080/models/apply \
-H "Content-Type: application/json" \
-d '{"id": "phi-2"}'
# List available models
curl http://localhost:8080/models/availablefrom openai import OpenAI
# Point to LocalAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed" # LocalAI doesn't require API key
)
# Chat completion (same as OpenAI!)
response = client.chat.completions.create(
model="phi-2",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)# Generate image with Stable Diffusion
response = client.images.generate(
model="stablediffusion",
prompt="A beautiful sunset over mountains",
size="512x512"
)
# Save image
import base64
image_data = base64.b64decode(response.data[0].b64_json)
with open("sunset.png", "wb") as f:
f.write(image_data)# Transcribe with Whisper
with open("audio.mp3", "rb") as f:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=f
)
print(transcript.text)# Generate speech
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Hello, this is LocalAI speaking!"
)
# Save audio
with open("speech.mp3", "wb") as f:
f.write(response.content)# Generate embeddings for RAG
response = client.embeddings.create(
model="text-embedding-ada-002",
input="Hello, world!"
)
embedding = response.data[0].embedding
print(f"Embedding dimension: {len(embedding)}")| Category | Models |
|---|---|
| LLM | Phi-2, LLaMA, Mistral, GPT4All |
| Image | Stable Diffusion, SDXL |
| Audio | Whisper (all sizes) |
| TTS | Piper, Coqui |
| Embedding | all-MiniLM, BGE |
| Model Size | RAM (CPU) | VRAM (GPU) |
|---|---|---|
| 3B | 4GB | 4GB |
| 7B | 8GB | 6GB |
| 13B | 16GB | 10GB |
| 70B | 64GB+ | 40GB+ |
- Chapters 1-3: Setup and text generation
- Run your first local LLM
- Chapters 4-6: Images, audio, and embeddings
- Build multi-modal applications
- Chapters 7-8: Configuration and production
- Scale local AI infrastructure
Ready to run AI locally? Let's begin with Chapter 1: Getting Started!
Generated for Awesome Code Docs
- Start Here: Chapter 1: Getting Started with LocalAI
- Back to Main Catalog
- Browse A-Z Tutorial Directory
- Search by Intent
- Explore Category Hubs
- Chapter 1: Getting Started with LocalAI
- Chapter 2: Model Gallery and Management
- Chapter 3: Text Generation and Chat Completions
- Chapter 4: Image Generation with Stable Diffusion
- Chapter 5: Audio Processing - Whisper & TTS
- Chapter 6: Vector Embeddings for RAG
- Chapter 7: Advanced Configuration and Tuning
- Chapter 8: Production Integration and Applications
Generated by AI Codebase Knowledge Builder