The production-grade LLM client. One API for 100+ providers. Pure Rust core with native bindings.
11,000+ models · 100+ providers · Rust | Python | Node.js
┌──────────────┐
│ Rust Core │
└──────┬───────┘
┌──────────┬─────────┼─────────┬──────────┐
▼ ▼ ▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐
│Python │ │ Node │ │ WASM │ │ Go │ │ ... │
│ ✅ │ │ ✅ │ │ Soon │ │ Soon │ │ │
└───────┘ └───────┘ └───────┘ └───────┘ └───────┘
Documentation · Changelog · Contributing
LLMKit is written in pure Rust — no Python runtime, no garbage collector, no memory leaks. Deploy with confidence knowing your LLM infrastructure won't degrade over time or crash under load.
- Memory Safety — Rust's ownership model eliminates memory leaks by design
- True Concurrency — No GIL. Handle thousands of concurrent streams efficiently
- Minimal Footprint — Native binary, not a 150MB Python package
- Run Forever — No worker restarts, no memory bloat, no surprises
- Prompt Caching — Native support for Anthropic, OpenAI, Google, DeepSeek. Save up to 90% on API costs
- Extended Thinking — Unified API for reasoning across 5 providers (Anthropic, OpenAI, Google, DeepSeek, OpenRouter)
- Streaming — Zero-copy streaming with automatic request deduplication
- 11,000+ Model Registry — Pricing, context limits, and capabilities baked in. No external API calls
| Feature | Description |
|---|---|
| Smart Router | ML-based provider selection optimizing for latency, cost, or reliability |
| Circuit Breaker | Automatic failure detection and recovery with anomaly detection |
| Rate Limiting | Lock-free, hierarchical rate limiting at scale |
| Cost Tracking | Multi-tenant metering with cache-aware pricing |
| Guardrails | PII detection, secret scanning, prompt injection prevention |
| Observability | OpenTelemetry integration for tracing and metrics |
use llmkit::{LLMKitClient, Message, CompletionRequest};
let client = LLMKitClient::from_env()?;
let response = client.complete(
CompletionRequest::new("anthropic/claude-sonnet-4-20250514", vec![Message::user("Hello!")])
).await?;
println!("{}", response.text_content());from llmkit import LLMKitClient, Message, CompletionRequest
client = LLMKitClient.from_env()
response = client.complete(CompletionRequest(
model="openai/gpt-4o",
messages=[Message.user("Hello!")]
))
print(response.text_content())import { LLMKitClient, Message, CompletionRequest } from 'llmkit-node'
const client = LLMKitClient.fromEnv()
const response = await client.complete(
new CompletionRequest('anthropic/claude-sonnet-4-20250514', [Message.user('Hello!')])
)
console.log(response.textContent())[dependencies]
llmkit = { version = "0.1", features = ["anthropic", "openai"] }pip install llmkit-pythonnpm install llmkit-node| Chat | Media | Specialized |
|---|---|---|
| Streaming | Image Generation | Embeddings |
| Tool Calling | Vision/Images | Token Counting |
| Structured Output | Audio STT/TTS | Batch Processing |
| Extended Thinking | Video Generation | Model Registry |
| Prompt Caching | 11,000+ Models |
| Category | Providers |
|---|---|
| Core | Anthropic, OpenAI, Azure OpenAI |
| Cloud | AWS Bedrock, Google Vertex AI, Google AI |
| Fast Inference | Groq, Mistral, Cerebras, SambaNova, Fireworks, DeepSeek |
| Enterprise | Cohere, AI21 |
| Hosted | Together, Perplexity, DeepInfra, OpenRouter |
| Local | Ollama, LM Studio, vLLM |
| Audio | Deepgram, ElevenLabs |
| Video | Runware |
See PROVIDERS.md for the full list with environment variables.
let mut stream = client.complete_stream(request).await?;
while let Some(chunk) = stream.next().await {
if let Some(text) = chunk?.text() { print!("{}", text); }
}from llmkit import ToolBuilder
tool = ToolBuilder("get_weather") \
.description("Get current weather") \
.string_param("city", "City name", required=True) \
.build()
request = CompletionRequest(model, messages).with_tools([tool])# Cache large system prompts - save up to 90% on repeated calls
request = CompletionRequest(
model="anthropic/claude-sonnet-4-20250514",
messages=[Message.system(large_prompt), Message.user("Question")]
).with_cache()// Unified reasoning API across providers
const request = new CompletionRequest('anthropic/claude-sonnet-4-20250514', messages)
.withThinking({ budgetTokens: 10000 })
const response = await client.complete(request)
console.log(response.thinkingContent()) // See the reasoning process
console.log(response.textContent()) // Final answerfrom llmkit import get_model_info, get_models_by_provider
# Get model details - no API calls, instant lookup
info = get_model_info("anthropic/claude-sonnet-4-20250514")
print(f"Context: {info.context_window}, Price: ${info.input_price}/1M tokens")
# Find models by provider
anthropic_models = get_models_by_provider("anthropic")For more examples, see examples/.
- Getting Started (Rust)
- Getting Started (Python)
- Getting Started (Node.js)
- Model Registry — 11,000+ models with pricing
git clone https://github.com/yfedoseev/llmkit
cd llmkit
cargo build --release
cargo test
# Python bindings
cd llmkit-python && maturin develop
# Node.js bindings
cd llmkit-node && pnpm install && pnpm buildWe welcome contributions! See CONTRIBUTING.md for guidelines.
Dual-licensed under MIT or Apache-2.0.
Built with Rust · Production Ready · GitHub