11# @stackbilt/llm-providers
22
3- A multi-provider LLM abstraction layer with automatic failover, circuit breakers, cost tracking, and intelligent retry. Built for Cloudflare Workers but runs anywhere with a standard ` fetch ` API. Extracted from a production orchestration platform handling 80K+ LOC across multiple services.
3+ A multi-provider LLM abstraction layer with automatic failover, graduated circuit breakers, cost tracking, and intelligent retry. Built for Cloudflare Workers but runs anywhere with a standard ` fetch ` API. Extracted from a production orchestration platform handling 80K+ LOC across multiple services.
44
55## Features
66
7- - ** Multi-provider failover** -- OpenAI, Anthropic, and Cloudflare Workers AI behind a single interface
8- - ** Circuit breaker** -- state machine (closed / open / half- open) prevents cascading failures
7+ - ** Multi-provider failover** -- OpenAI, Anthropic, Cloudflare Workers AI, Cerebras, and Groq behind a single interface
8+ - ** Graduated circuit breaker** -- 4- state machine (closed / degraded / recovering / open) with probabilistic traffic routing prevents cascading failures
99- ** Exponential backoff retry** -- configurable delays, jitter, and per-error-class behavior
10- - ** Cost tracking and optimization** -- per-provider cost attribution, budget alerts, automatic routing to cheaper providers
11- - ** Streaming** -- SSE streaming support for all three providers
12- - ** Tool/function calling** -- OpenAI and Anthropic tool use with unified response format
13- - ** Batch processing** -- concurrent request batching with rate-limit awareness
10+ - ** Cost tracking and optimization** -- per-provider cost attribution, budget alerts with CreditLedger, automatic routing to cheaper providers
11+ - ** Rate limit enforcement** -- CreditLedger tracks RPM/RPD/TPM/TPD per provider; factory skips providers that exceed limits
12+ - ** Streaming** -- SSE streaming support for all providers
13+ - ** Tool/function calling** -- OpenAI, Anthropic, Cerebras, and Cloudflare tool use with unified response format
14+ - ** Image generation** -- Cloudflare Workers AI (SDXL, FLUX) and Google Gemini
1415- ** Health monitoring** -- per-provider health checks, metrics, and circuit breaker state
16+ - ** Structured logging** -- injectable ` Logger ` interface; silent by default, opt-in to console or custom loggers
17+ - ** Zero runtime dependencies** -- no transitive dependency tree to audit
1518
1619## Installation
1720
@@ -46,55 +49,75 @@ console.log(response.message);
4649console .log (` Provider: ${response .provider }, Cost: $${response .usage .cost } ` );
4750```
4851
49- ## Provider Configuration
50-
51- ### OpenAI
52+ ### Auto-Discovery from Environment
5253
5354``` typescript
54- {
55- apiKey : ' sk-... ' ,
56- organization : ' org-... ' , // optional
57- project : ' proj-... ' , // optional
58- baseUrl : ' https://api.openai.com/v1 ' , // optional, for proxies
59- timeout : 30000 ,
60- maxRetries : 3 ,
61- }
55+ import { LLMProviders } from ' @stackbilt/llm-providers ' ;
56+
57+ // Scans env for ANTHROPIC_API_KEY, OPENAI_API_KEY, GROQ_API_KEY,
58+ // CEREBRAS_API_KEY, and AI binding — configures only what's present
59+ const llm = LLMProviders . fromEnv ( env , {
60+ costOptimization: true ,
61+ enableCircuitBreaker: true ,
62+ });
6263```
6364
64- ### Anthropic
65+ ## Providers
66+
67+ | Provider | Models | Streaming | Tools | Notes |
68+ | ----------| --------| -----------| -------| -------|
69+ | ** OpenAI** | GPT-4o, GPT-4o Mini, GPT-4 Turbo, GPT-4 | Yes | Yes | Default: ` gpt-4o-mini ` |
70+ | ** Anthropic** | Claude Opus 4.6, Sonnet 4.6, Sonnet 4, Haiku 4.5, 3.7 Sonnet, 3.5 Sonnet/Haiku, 3 Opus/Sonnet/Haiku | Yes | Yes | Default: ` claude-haiku-4-5 ` |
71+ | ** Cloudflare** | LLaMA 3.1 8B/70B, GPT-OSS 120B, Mistral 7B, Qwen 1.5, TinyLlama, and more | Yes | GPT-OSS only | Near-zero cost |
72+ | ** Cerebras** | LLaMA 3.1 8B, LLaMA 3.3 70B, ZAI-GLM 4.7, Qwen 3 235B | Yes | GLM/Qwen only | ~ 2,200 tok/s |
73+ | ** Groq** | LLaMA 3.3 70B Versatile, LLaMA 3.1 8B Instant | Yes | No | Ultra-fast inference |
74+
75+ ### Provider Configuration
6576
6677``` typescript
67- {
68- apiKey : ' sk-ant-...' ,
69- version : ' 2023-06-01' , // optional
70- baseUrl : ' https://api.anthropic.com' , // optional
71- timeout : 30000 ,
72- maxRetries : 3 ,
73- }
78+ // OpenAI
79+ { apiKey : ' sk-...' , organization : ' org-...' , project : ' proj-...' }
80+
81+ // Anthropic
82+ { apiKey : ' sk-ant-...' , version : ' 2023-06-01' }
83+
84+ // Cloudflare Workers AI
85+ { ai : env .AI , accountId : ' ...' }
86+
87+ // Cerebras
88+ { apiKey : ' csk-...' }
89+
90+ // Groq
91+ { apiKey : ' gsk_...' }
7492```
7593
76- ### Cloudflare Workers AI
94+ ## Logging
95+
96+ The library is silent by default. Opt in to logging by passing a ` Logger ` :
7797
7898``` typescript
79- {
80- ai : env . AI , // Cloudflare AI binding (required)
81- accountId : ' ... ' , // optional
82- timeout : 30000 ,
83- maxRetries : 3 ,
84- }
99+ import { LLMProviders , consoleLogger } from ' @stackbilt/llm-providers ' ;
100+
101+ const llm = new LLMProviders ({
102+ anthropic: { apiKey: ' ... ' , logger: consoleLogger } ,
103+ logger: consoleLogger , // factory-level logging
104+ });
85105```
86106
107+ Or implement your own ` Logger ` interface (` debug ` , ` info ` , ` warn ` , ` error ` ).
108+
87109## Circuit Breaker
88110
89- Each provider gets its own circuit breaker that tracks consecutive failures .
111+ Each provider gets a graduated circuit breaker that routes traffic away from failing providers with probabilistic degradation .
90112
91113| State | Behavior |
92114| -------| ----------|
93- | ** Closed** | Requests pass through normally. Failures increment a counter. |
94- | ** Open** | All requests are immediately rejected. After ` resetTimeout ` ms, transitions to half-open. |
95- | ** Half-open** | A single test request is allowed through. Success closes the circuit; failure re-opens it. |
115+ | ** Closed** | 100% traffic to primary. Failures increment counter. |
116+ | ** Degraded** | Traffic splits probabilistically (90% → 70% → 40% → 10%) as failures accumulate. |
117+ | ** Recovering** | Success steps traffic back up one level at a time. |
118+ | ** Open** | 0% traffic. After ` resetTimeout ` ms, failures decay and traffic resumes. |
96119
97- Default thresholds : 5 failures to open , 60s reset timeout, 5-minute monitoring window.
120+ Default: 5-step degradation curve ` [1.0, 0.9, 0.7, 0.4, 0.1] ` , 60s reset timeout, 5-minute monitoring window.
98121
99122``` typescript
100123import { CircuitBreakerManager } from ' @stackbilt/llm-providers' ;
@@ -103,47 +126,37 @@ const manager = new CircuitBreakerManager({
103126 failureThreshold: 5 ,
104127 resetTimeout: 60000 ,
105128 monitoringPeriod: 300000 ,
129+ degradationCurve: [1.0 , 0.9 , 0.7 , 0.4 , 0.1 ],
106130});
107131
108132const breaker = manager .getBreaker (' openai' );
109133console .log (breaker .getHealth ());
110134```
111135
112- ## Cost Optimization
113-
114- When ` costOptimization: true ` , the factory routes requests to the cheapest available provider. Cloudflare Workers AI is essentially free and gets top priority.
136+ ## Cost Tracking & Budget Management
115137
116138``` typescript
117- import { createCostOptimizedLLMProviders } from ' @stackbilt/llm-providers' ;
139+ import { CreditLedger , LLMProviders } from ' @stackbilt/llm-providers' ;
118140
119- const llm = createCostOptimizedLLMProviders ({
120- openai: { apiKey: process .env .OPENAI_API_KEY },
121- cloudflare: { ai: env .AI },
141+ const ledger = new CreditLedger ({
142+ budgets: [
143+ { provider: ' openai' , monthlyBudget: 50 , rateLimits: { rpm: 60 , rpd: 10000 } },
144+ { provider: ' anthropic' , monthlyBudget: 100 },
145+ ],
122146});
123147
124- const analytics = llm .getCostAnalytics ();
125- // { breakdown: { openai: { cost, requests, tokens }, ... }, total: 0.042, recommendations: [...] }
126- ```
127-
128- ## Retry with Backoff
129-
130- Transient errors (rate limits, network errors, server errors) are retried automatically with exponential backoff and jitter.
148+ // Threshold alerts fire at 80%, 90%, 95% utilization
149+ ledger .on ((event ) => {
150+ if (event .type === ' threshold_crossed' ) {
151+ console .warn (` ${event .provider }: ${event .tier } — ${event .utilizationPct .toFixed (0 )}% of budget ` );
152+ }
153+ });
131154
132- ``` typescript
133- import { RetryManager , retry } from ' @stackbilt/llm-providers' ;
134-
135- // Standalone retry for any async operation
136- const result = await retry (
137- () => fetch (' https://api.example.com/data' ),
138- { maxRetries: 3 , initialDelay: 1000 , backoffMultiplier: 2 }
139- );
140-
141- // Or configure per-provider via RetryManager
142- const retryManager = new RetryManager ({
143- maxRetries: 5 ,
144- initialDelay: 500 ,
145- maxDelay: 30000 ,
146- backoffMultiplier: 2 ,
155+ const llm = new LLMProviders ({
156+ openai: { apiKey: ' ...' },
157+ anthropic: { apiKey: ' ...' },
158+ costOptimization: true ,
159+ ledger , // Factory enforces rate limits and tracks spend
147160});
148161```
149162
@@ -156,6 +169,7 @@ const llm = new LLMProviders({
156169 openai: { apiKey: ' ...' },
157170 anthropic: { apiKey: ' ...' },
158171 cloudflare: { ai: env .AI },
172+ cerebras: { apiKey: ' ...' },
159173 fallbackRules: [
160174 { condition: ' rate_limit' , fallbackProvider: ' cloudflare' },
161175 { condition: ' cost' , threshold: 10 , fallbackProvider: ' cloudflare' },
@@ -164,6 +178,8 @@ const llm = new LLMProviders({
164178});
165179```
166180
181+ Default fallback priority includes all configured providers: Cloudflare → Cerebras → Groq → Anthropic → OpenAI.
182+
167183## Error Handling
168184
169185Structured error classes for each failure mode:
@@ -175,7 +191,6 @@ import {
175191 AuthenticationError ,
176192 CircuitBreakerOpenError ,
177193 TimeoutError ,
178- LLMErrorFactory ,
179194} from ' @stackbilt/llm-providers' ;
180195
181196try {
@@ -193,14 +208,16 @@ try {
193208
194209## Model Constants
195210
196- Predefined model identifiers for convenience:
197-
198211``` typescript
199212import { MODELS , getRecommendedModel } from ' @stackbilt/llm-providers' ;
200213
201- MODELS .GPT_4O ; // 'gpt-4o'
202- MODELS .CLAUDE_3_5_SONNET ; // 'claude-3-5-sonnet-20241022'
203- MODELS .LLAMA_3_1_8B ; // '@cf/meta/llama-3.1-8b-instruct'
214+ // Current-gen models
215+ MODELS .CLAUDE_OPUS_4_6 ; // 'claude-opus-4-6-20250618'
216+ MODELS .CLAUDE_SONNET_4_6 ; // 'claude-sonnet-4-6-20250618'
217+ MODELS .CLAUDE_HAIKU_4_5 ; // 'claude-haiku-4-5-20251001'
218+ MODELS .GPT_4O ; // 'gpt-4o'
219+ MODELS .GPT_4O_MINI ; // 'gpt-4o-mini'
220+ MODELS .CEREBRAS_ZAI_GLM_4_7 ; // 'zai-glm-4.7'
204221
205222// Get best model for a use case given available providers
206223const model = getRecommendedModel (' COST_EFFECTIVE' , [' openai' , ' cloudflare' ]);
@@ -214,41 +231,51 @@ const model = getRecommendedModel('COST_EFFECTIVE', ['openai', 'cloudflare']);
214231| -------| -------------|
215232| ` LLMProviders ` | High-level facade -- initialize providers, generate responses, check health |
216233| ` LLMProviderFactory ` | Lower-level factory with provider chain building and fallback logic |
217- | ` OpenAIProvider ` | OpenAI GPT models (streaming, tools, batch ) |
234+ | ` OpenAIProvider ` | OpenAI GPT models (streaming, tools) |
218235| ` AnthropicProvider ` | Anthropic Claude models (streaming, tools) |
219- | ` CloudflareProvider ` | Cloudflare Workers AI (streaming, batch, cost optimization) |
236+ | ` CloudflareProvider ` | Cloudflare Workers AI (streaming, tools on GPT-OSS, batch) |
237+ | ` CerebrasProvider ` | Cerebras fast inference (streaming, tools on GLM/Qwen) |
238+ | ` GroqProvider ` | Groq fast inference (streaming) |
220239| ` BaseProvider ` | Abstract base with shared resiliency, metrics, and cost calculation |
221240
222241### Utilities
223242
224243| Class | Description |
225244| -------| -------------|
226- | ` CircuitBreaker ` | Per-provider circuit breaker state machine |
245+ | ` CircuitBreaker ` | Graduated 4-state circuit breaker with probabilistic degradation |
227246| ` CircuitBreakerManager ` | Manages circuit breakers across multiple providers |
228247| ` RetryManager ` | Exponential backoff retry with jitter |
229248| ` CostTracker ` | Per-provider cost accumulation and budget alerts |
249+ | ` CreditLedger ` | Monthly budgets, rate limits, burn rate projection, threshold events |
230250| ` CostOptimizer ` | Static methods for optimal provider selection |
251+ | ` ImageProvider ` | Multi-provider image generation (Cloudflare SDXL/FLUX, Google Gemini) |
252+
253+ ### Logger
254+
255+ | Export | Description |
256+ | --------| -------------|
257+ | ` Logger ` | Interface: ` debug ` , ` info ` , ` warn ` , ` error ` methods |
258+ | ` noopLogger ` | Silent logger (default) |
259+ | ` consoleLogger ` | Forwards to ` console.* ` (opt-in) |
231260
232261### Key Types
233262
234263| Type | Description |
235264| ------| -------------|
236- | ` LLMRequest ` | Unified request: messages, model, temperature, tools, metadata |
237- | ` LLMResponse ` | Unified response: message, usage, provider, cost, tool calls |
238- | ` LLMProvider ` | Provider interface: generateResponse, healthCheck, estimateCost |
239- | ` ProviderFactoryConfig ` | Factory configuration: provider configs, fallback rules, flags |
240- | ` CircuitBreakerConfig ` | Failure threshold, reset timeout, monitoring period |
241- | ` RetryConfig ` | Max retries, delays, backoff multiplier, retryable error codes |
242- | ` CostConfig ` | Token costs, monthly budget, alert threshold |
265+ | ` LLMRequest ` | Unified request: messages, model, temperature, tools, response_format |
266+ | ` LLMResponse ` | Unified response: message, usage (with cost), provider, tool calls |
267+ | ` TokenUsage ` | Token counts and cost (inputTokens, outputTokens, totalTokens, cost) |
268+ | ` ProviderFactoryConfig ` | Factory config: provider configs, fallback rules, ledger, logger |
269+ | ` CostAnalytics ` | Cost breakdown, total, and recommendations |
270+ | ` ProviderHealthEntry ` | Health status, metrics, circuit breaker state, capabilities |
243271
244272### Factory Functions
245273
246274| Function | Description |
247275| ----------| -------------|
248276| ` createLLMProviders(config) ` | Create an ` LLMProviders ` instance |
249277| ` createCostOptimizedLLMProviders(config) ` | Create with cost optimization, circuit breakers, and retries enabled |
250- | ` createLLMProviderFactory(config) ` | Create a bare ` LLMProviderFactory ` |
251- | ` createCostOptimizedFactory(config) ` | Create a cost-optimized factory |
278+ | ` LLMProviders.fromEnv(env) ` | Auto-discover providers from environment variables |
252279| ` getRecommendedModel(useCase, providers) ` | Pick the best model for a use case |
253280| ` retry(fn, config) ` | One-shot retry wrapper for any async function |
254281
0 commit comments