Skip to content

Commit ad51d7a

Browse files
Codebeastclaude
andcommitted
docs: update README, CHANGELOG, and bump to v1.2.0
- README: added Cerebras/Groq providers, Logger section, fromEnv() example, graduated circuit breaker docs, CreditLedger usage, updated model constants to current-gen, corrected API reference tables for all 5 providers and new exports - CHANGELOG: added [1.2.0] entry covering all 11 audit fixes - package.json: version 1.1.0 → 1.2.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 7af1785 commit ad51d7a

3 files changed

Lines changed: 138 additions & 86 deletions

File tree

CHANGELOG.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,31 @@
33
All notable changes to `@stackbilt/llm-providers` are documented here.
44
Format follows [Keep a Changelog](https://keepachangelog.com/). Versions use [Semantic Versioning](https://semver.org/).
55

6+
## [1.2.0] — 2026-04-01
7+
8+
### Added
9+
- **Structured Logger**`Logger` interface with `noopLogger` (silent default) and `consoleLogger` (opt-in). All components accept an optional `logger` via config. Zero `console.*` calls in production code.
10+
- **Rate limit enforcement**`LLMProviderFactory` now checks `CreditLedger.checkRateLimit(rpm/rpd)` before dispatching to each provider, skipping exceeded providers.
11+
- **Claude 4.6 models**`claude-opus-4-6-20250618`, `claude-sonnet-4-6-20250618` added to Anthropic provider.
12+
- **Claude Haiku 4.5**`claude-haiku-4-5-20251001` added.
13+
- **Claude 3.7 Sonnet**`claude-3-7-sonnet-20250219` added (replaces incorrect `claude-sonnet-3.7` ID).
14+
- **`CostAnalytics`** and **`ProviderHealthEntry`** — typed return values for `getCostAnalytics()` and `getProviderHealth()`.
15+
16+
### Fixed
17+
- **30+ `any` types eliminated** — all provider interfaces, tool call types, Workers AI response shapes, error bodies, cost analytics returns, and decorator signatures fully typed. Three boundary casts for Cloudflare `Ai.run()` retained with explicit eslint-disable.
18+
- **Data leak removed**`console.log` at `anthropic.ts:492` that dumped full tool call payloads into worker logs.
19+
- **Anthropic JSON mode** — only prepends `{` if the response doesn't already start with one, preventing `{{...}` corruption.
20+
- **OpenAI `supportsBatching`** — set to `false` (was `true` but `processBatch()` is a sequential loop).
21+
- **Default model** — OpenAI default changed from deprecated `gpt-3.5-turbo` to `gpt-4o-mini`.
22+
- **Default fallback chain** — now includes all 5 configured providers (was hardcoded to cloudflare/anthropic/openai, excluding Cerebras and Groq).
23+
- **Anthropic healthCheck** — switched from real API call (burned tokens) to lightweight OPTIONS reachability check.
24+
- **`TokenUsage.cost`** — made required (was optional, causing NaN accumulation in cost tracker).
25+
- **Circuit breaker test isolation**`defaultCircuitBreakerManager.resetAll()` in all test `beforeEach` blocks to prevent cross-test state leaks.
26+
27+
### Changed
28+
- **Logging default** — library is now silent by default (`noopLogger`). Pass `consoleLogger` or a custom `Logger` to enable output.
29+
- **Model catalog** — updated to current-gen models; removed stale/incorrect model IDs and TBD pricing.
30+
631
## [1.1.0] — 2026-04-01
732

833
### Added

README.md

Lines changed: 112 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,20 @@
11
# @stackbilt/llm-providers
22

3-
A multi-provider LLM abstraction layer with automatic failover, circuit breakers, cost tracking, and intelligent retry. Built for Cloudflare Workers but runs anywhere with a standard `fetch` API. Extracted from a production orchestration platform handling 80K+ LOC across multiple services.
3+
A multi-provider LLM abstraction layer with automatic failover, graduated circuit breakers, cost tracking, and intelligent retry. Built for Cloudflare Workers but runs anywhere with a standard `fetch` API. Extracted from a production orchestration platform handling 80K+ LOC across multiple services.
44

55
## Features
66

7-
- **Multi-provider failover** -- OpenAI, Anthropic, and Cloudflare Workers AI behind a single interface
8-
- **Circuit breaker** -- state machine (closed / open / half-open) prevents cascading failures
7+
- **Multi-provider failover** -- OpenAI, Anthropic, Cloudflare Workers AI, Cerebras, and Groq behind a single interface
8+
- **Graduated circuit breaker** -- 4-state machine (closed / degraded / recovering / open) with probabilistic traffic routing prevents cascading failures
99
- **Exponential backoff retry** -- configurable delays, jitter, and per-error-class behavior
10-
- **Cost tracking and optimization** -- per-provider cost attribution, budget alerts, automatic routing to cheaper providers
11-
- **Streaming** -- SSE streaming support for all three providers
12-
- **Tool/function calling** -- OpenAI and Anthropic tool use with unified response format
13-
- **Batch processing** -- concurrent request batching with rate-limit awareness
10+
- **Cost tracking and optimization** -- per-provider cost attribution, budget alerts with CreditLedger, automatic routing to cheaper providers
11+
- **Rate limit enforcement** -- CreditLedger tracks RPM/RPD/TPM/TPD per provider; factory skips providers that exceed limits
12+
- **Streaming** -- SSE streaming support for all providers
13+
- **Tool/function calling** -- OpenAI, Anthropic, Cerebras, and Cloudflare tool use with unified response format
14+
- **Image generation** -- Cloudflare Workers AI (SDXL, FLUX) and Google Gemini
1415
- **Health monitoring** -- per-provider health checks, metrics, and circuit breaker state
16+
- **Structured logging** -- injectable `Logger` interface; silent by default, opt-in to console or custom loggers
17+
- **Zero runtime dependencies** -- no transitive dependency tree to audit
1518

1619
## Installation
1720

@@ -46,55 +49,75 @@ console.log(response.message);
4649
console.log(`Provider: ${response.provider}, Cost: $${response.usage.cost}`);
4750
```
4851

49-
## Provider Configuration
50-
51-
### OpenAI
52+
### Auto-Discovery from Environment
5253

5354
```typescript
54-
{
55-
apiKey: 'sk-...',
56-
organization: 'org-...', // optional
57-
project: 'proj-...', // optional
58-
baseUrl: 'https://api.openai.com/v1', // optional, for proxies
59-
timeout: 30000,
60-
maxRetries: 3,
61-
}
55+
import { LLMProviders } from '@stackbilt/llm-providers';
56+
57+
// Scans env for ANTHROPIC_API_KEY, OPENAI_API_KEY, GROQ_API_KEY,
58+
// CEREBRAS_API_KEY, and AI binding — configures only what's present
59+
const llm = LLMProviders.fromEnv(env, {
60+
costOptimization: true,
61+
enableCircuitBreaker: true,
62+
});
6263
```
6364

64-
### Anthropic
65+
## Providers
66+
67+
| Provider | Models | Streaming | Tools | Notes |
68+
|----------|--------|-----------|-------|-------|
69+
| **OpenAI** | GPT-4o, GPT-4o Mini, GPT-4 Turbo, GPT-4 | Yes | Yes | Default: `gpt-4o-mini` |
70+
| **Anthropic** | Claude Opus 4.6, Sonnet 4.6, Sonnet 4, Haiku 4.5, 3.7 Sonnet, 3.5 Sonnet/Haiku, 3 Opus/Sonnet/Haiku | Yes | Yes | Default: `claude-haiku-4-5` |
71+
| **Cloudflare** | LLaMA 3.1 8B/70B, GPT-OSS 120B, Mistral 7B, Qwen 1.5, TinyLlama, and more | Yes | GPT-OSS only | Near-zero cost |
72+
| **Cerebras** | LLaMA 3.1 8B, LLaMA 3.3 70B, ZAI-GLM 4.7, Qwen 3 235B | Yes | GLM/Qwen only | ~2,200 tok/s |
73+
| **Groq** | LLaMA 3.3 70B Versatile, LLaMA 3.1 8B Instant | Yes | No | Ultra-fast inference |
74+
75+
### Provider Configuration
6576

6677
```typescript
67-
{
68-
apiKey: 'sk-ant-...',
69-
version: '2023-06-01', // optional
70-
baseUrl: 'https://api.anthropic.com', // optional
71-
timeout: 30000,
72-
maxRetries: 3,
73-
}
78+
// OpenAI
79+
{ apiKey: 'sk-...', organization: 'org-...', project: 'proj-...' }
80+
81+
// Anthropic
82+
{ apiKey: 'sk-ant-...', version: '2023-06-01' }
83+
84+
// Cloudflare Workers AI
85+
{ ai: env.AI, accountId: '...' }
86+
87+
// Cerebras
88+
{ apiKey: 'csk-...' }
89+
90+
// Groq
91+
{ apiKey: 'gsk_...' }
7492
```
7593

76-
### Cloudflare Workers AI
94+
## Logging
95+
96+
The library is silent by default. Opt in to logging by passing a `Logger`:
7797

7898
```typescript
79-
{
80-
ai: env.AI, // Cloudflare AI binding (required)
81-
accountId: '...', // optional
82-
timeout: 30000,
83-
maxRetries: 3,
84-
}
99+
import { LLMProviders, consoleLogger } from '@stackbilt/llm-providers';
100+
101+
const llm = new LLMProviders({
102+
anthropic: { apiKey: '...', logger: consoleLogger },
103+
logger: consoleLogger, // factory-level logging
104+
});
85105
```
86106

107+
Or implement your own `Logger` interface (`debug`, `info`, `warn`, `error`).
108+
87109
## Circuit Breaker
88110

89-
Each provider gets its own circuit breaker that tracks consecutive failures.
111+
Each provider gets a graduated circuit breaker that routes traffic away from failing providers with probabilistic degradation.
90112

91113
| State | Behavior |
92114
|-------|----------|
93-
| **Closed** | Requests pass through normally. Failures increment a counter. |
94-
| **Open** | All requests are immediately rejected. After `resetTimeout` ms, transitions to half-open. |
95-
| **Half-open** | A single test request is allowed through. Success closes the circuit; failure re-opens it. |
115+
| **Closed** | 100% traffic to primary. Failures increment counter. |
116+
| **Degraded** | Traffic splits probabilistically (90% → 70% → 40% → 10%) as failures accumulate. |
117+
| **Recovering** | Success steps traffic back up one level at a time. |
118+
| **Open** | 0% traffic. After `resetTimeout` ms, failures decay and traffic resumes. |
96119

97-
Default thresholds: 5 failures to open, 60s reset timeout, 5-minute monitoring window.
120+
Default: 5-step degradation curve `[1.0, 0.9, 0.7, 0.4, 0.1]`, 60s reset timeout, 5-minute monitoring window.
98121

99122
```typescript
100123
import { CircuitBreakerManager } from '@stackbilt/llm-providers';
@@ -103,47 +126,37 @@ const manager = new CircuitBreakerManager({
103126
failureThreshold: 5,
104127
resetTimeout: 60000,
105128
monitoringPeriod: 300000,
129+
degradationCurve: [1.0, 0.9, 0.7, 0.4, 0.1],
106130
});
107131

108132
const breaker = manager.getBreaker('openai');
109133
console.log(breaker.getHealth());
110134
```
111135

112-
## Cost Optimization
113-
114-
When `costOptimization: true`, the factory routes requests to the cheapest available provider. Cloudflare Workers AI is essentially free and gets top priority.
136+
## Cost Tracking & Budget Management
115137

116138
```typescript
117-
import { createCostOptimizedLLMProviders } from '@stackbilt/llm-providers';
139+
import { CreditLedger, LLMProviders } from '@stackbilt/llm-providers';
118140

119-
const llm = createCostOptimizedLLMProviders({
120-
openai: { apiKey: process.env.OPENAI_API_KEY },
121-
cloudflare: { ai: env.AI },
141+
const ledger = new CreditLedger({
142+
budgets: [
143+
{ provider: 'openai', monthlyBudget: 50, rateLimits: { rpm: 60, rpd: 10000 } },
144+
{ provider: 'anthropic', monthlyBudget: 100 },
145+
],
122146
});
123147

124-
const analytics = llm.getCostAnalytics();
125-
// { breakdown: { openai: { cost, requests, tokens }, ... }, total: 0.042, recommendations: [...] }
126-
```
127-
128-
## Retry with Backoff
129-
130-
Transient errors (rate limits, network errors, server errors) are retried automatically with exponential backoff and jitter.
148+
// Threshold alerts fire at 80%, 90%, 95% utilization
149+
ledger.on((event) => {
150+
if (event.type === 'threshold_crossed') {
151+
console.warn(`${event.provider}: ${event.tier} — ${event.utilizationPct.toFixed(0)}% of budget`);
152+
}
153+
});
131154

132-
```typescript
133-
import { RetryManager, retry } from '@stackbilt/llm-providers';
134-
135-
// Standalone retry for any async operation
136-
const result = await retry(
137-
() => fetch('https://api.example.com/data'),
138-
{ maxRetries: 3, initialDelay: 1000, backoffMultiplier: 2 }
139-
);
140-
141-
// Or configure per-provider via RetryManager
142-
const retryManager = new RetryManager({
143-
maxRetries: 5,
144-
initialDelay: 500,
145-
maxDelay: 30000,
146-
backoffMultiplier: 2,
155+
const llm = new LLMProviders({
156+
openai: { apiKey: '...' },
157+
anthropic: { apiKey: '...' },
158+
costOptimization: true,
159+
ledger, // Factory enforces rate limits and tracks spend
147160
});
148161
```
149162

@@ -156,6 +169,7 @@ const llm = new LLMProviders({
156169
openai: { apiKey: '...' },
157170
anthropic: { apiKey: '...' },
158171
cloudflare: { ai: env.AI },
172+
cerebras: { apiKey: '...' },
159173
fallbackRules: [
160174
{ condition: 'rate_limit', fallbackProvider: 'cloudflare' },
161175
{ condition: 'cost', threshold: 10, fallbackProvider: 'cloudflare' },
@@ -164,6 +178,8 @@ const llm = new LLMProviders({
164178
});
165179
```
166180

181+
Default fallback priority includes all configured providers: Cloudflare → Cerebras → Groq → Anthropic → OpenAI.
182+
167183
## Error Handling
168184

169185
Structured error classes for each failure mode:
@@ -175,7 +191,6 @@ import {
175191
AuthenticationError,
176192
CircuitBreakerOpenError,
177193
TimeoutError,
178-
LLMErrorFactory,
179194
} from '@stackbilt/llm-providers';
180195

181196
try {
@@ -193,14 +208,16 @@ try {
193208

194209
## Model Constants
195210

196-
Predefined model identifiers for convenience:
197-
198211
```typescript
199212
import { MODELS, getRecommendedModel } from '@stackbilt/llm-providers';
200213

201-
MODELS.GPT_4O; // 'gpt-4o'
202-
MODELS.CLAUDE_3_5_SONNET; // 'claude-3-5-sonnet-20241022'
203-
MODELS.LLAMA_3_1_8B; // '@cf/meta/llama-3.1-8b-instruct'
214+
// Current-gen models
215+
MODELS.CLAUDE_OPUS_4_6; // 'claude-opus-4-6-20250618'
216+
MODELS.CLAUDE_SONNET_4_6; // 'claude-sonnet-4-6-20250618'
217+
MODELS.CLAUDE_HAIKU_4_5; // 'claude-haiku-4-5-20251001'
218+
MODELS.GPT_4O; // 'gpt-4o'
219+
MODELS.GPT_4O_MINI; // 'gpt-4o-mini'
220+
MODELS.CEREBRAS_ZAI_GLM_4_7; // 'zai-glm-4.7'
204221

205222
// Get best model for a use case given available providers
206223
const model = getRecommendedModel('COST_EFFECTIVE', ['openai', 'cloudflare']);
@@ -214,41 +231,51 @@ const model = getRecommendedModel('COST_EFFECTIVE', ['openai', 'cloudflare']);
214231
|-------|-------------|
215232
| `LLMProviders` | High-level facade -- initialize providers, generate responses, check health |
216233
| `LLMProviderFactory` | Lower-level factory with provider chain building and fallback logic |
217-
| `OpenAIProvider` | OpenAI GPT models (streaming, tools, batch) |
234+
| `OpenAIProvider` | OpenAI GPT models (streaming, tools) |
218235
| `AnthropicProvider` | Anthropic Claude models (streaming, tools) |
219-
| `CloudflareProvider` | Cloudflare Workers AI (streaming, batch, cost optimization) |
236+
| `CloudflareProvider` | Cloudflare Workers AI (streaming, tools on GPT-OSS, batch) |
237+
| `CerebrasProvider` | Cerebras fast inference (streaming, tools on GLM/Qwen) |
238+
| `GroqProvider` | Groq fast inference (streaming) |
220239
| `BaseProvider` | Abstract base with shared resiliency, metrics, and cost calculation |
221240

222241
### Utilities
223242

224243
| Class | Description |
225244
|-------|-------------|
226-
| `CircuitBreaker` | Per-provider circuit breaker state machine |
245+
| `CircuitBreaker` | Graduated 4-state circuit breaker with probabilistic degradation |
227246
| `CircuitBreakerManager` | Manages circuit breakers across multiple providers |
228247
| `RetryManager` | Exponential backoff retry with jitter |
229248
| `CostTracker` | Per-provider cost accumulation and budget alerts |
249+
| `CreditLedger` | Monthly budgets, rate limits, burn rate projection, threshold events |
230250
| `CostOptimizer` | Static methods for optimal provider selection |
251+
| `ImageProvider` | Multi-provider image generation (Cloudflare SDXL/FLUX, Google Gemini) |
252+
253+
### Logger
254+
255+
| Export | Description |
256+
|--------|-------------|
257+
| `Logger` | Interface: `debug`, `info`, `warn`, `error` methods |
258+
| `noopLogger` | Silent logger (default) |
259+
| `consoleLogger` | Forwards to `console.*` (opt-in) |
231260

232261
### Key Types
233262

234263
| Type | Description |
235264
|------|-------------|
236-
| `LLMRequest` | Unified request: messages, model, temperature, tools, metadata |
237-
| `LLMResponse` | Unified response: message, usage, provider, cost, tool calls |
238-
| `LLMProvider` | Provider interface: generateResponse, healthCheck, estimateCost |
239-
| `ProviderFactoryConfig` | Factory configuration: provider configs, fallback rules, flags |
240-
| `CircuitBreakerConfig` | Failure threshold, reset timeout, monitoring period |
241-
| `RetryConfig` | Max retries, delays, backoff multiplier, retryable error codes |
242-
| `CostConfig` | Token costs, monthly budget, alert threshold |
265+
| `LLMRequest` | Unified request: messages, model, temperature, tools, response_format |
266+
| `LLMResponse` | Unified response: message, usage (with cost), provider, tool calls |
267+
| `TokenUsage` | Token counts and cost (inputTokens, outputTokens, totalTokens, cost) |
268+
| `ProviderFactoryConfig` | Factory config: provider configs, fallback rules, ledger, logger |
269+
| `CostAnalytics` | Cost breakdown, total, and recommendations |
270+
| `ProviderHealthEntry` | Health status, metrics, circuit breaker state, capabilities |
243271

244272
### Factory Functions
245273

246274
| Function | Description |
247275
|----------|-------------|
248276
| `createLLMProviders(config)` | Create an `LLMProviders` instance |
249277
| `createCostOptimizedLLMProviders(config)` | Create with cost optimization, circuit breakers, and retries enabled |
250-
| `createLLMProviderFactory(config)` | Create a bare `LLMProviderFactory` |
251-
| `createCostOptimizedFactory(config)` | Create a cost-optimized factory |
278+
| `LLMProviders.fromEnv(env)` | Auto-discover providers from environment variables |
252279
| `getRecommendedModel(useCase, providers)` | Pick the best model for a use case |
253280
| `retry(fn, config)` | One-shot retry wrapper for any async function |
254281

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@stackbilt/llm-providers",
3-
"version": "1.1.0",
3+
"version": "1.2.0",
44
"description": "Multi-LLM failover with circuit breakers, cost tracking, and intelligent retry. Cloudflare Workers native.",
55
"author": "Stackbilt <admin@stackbilt.dev>",
66
"license": "Apache-2.0",

0 commit comments

Comments
 (0)